Predicting metabolons

‘Are we there yet?’

Sibbe Bakker

AG Walther

July 26, 2024

Introduction

What are metabolons and why are they important?

Metabolism and metabolons

A compartmentalisaton mechanism.

Metabolons

Molecular tunnel

Surface channeling

Cluster channeling
Figure 1: Mechanisms of metabolite channelling [1].

What are these channels actually?

Direct channeling in Tryptophan synthase [4]

Surface channeling in a recombinant TCA metabolon [5]

What is this LLPS?

Formation of LLPS [6]

Various known functions of LLPS. From Alberti et al., [7]

This research

Do recent ML innovations help?

What’s the gap?

  • Protein complexes can be hard to crystallise.

  • Often the exact nature of protein complexes is often unknown.

  • Homology based approaches have met some success.

  • Try to solve it with Deep learning methods.

Why deep learning now?

  • New methods are published:
    • Alphafold 3 [8]
    • NeuralPlexer [9]
    • Combfold [10]
    • Rozettafold All atom [11]
    • AF2Complex [12]
    • MOLPC [13, 14]
  • Its important to see:
    • How they work.
    • How they are interpreted.
    • How they may be applied.

The research plan

Methods

How to assess protein interaction using Deep Learning

Overview

A short note on neural networks

Example of a neural network [15]

Protein complex prediction

AF2 architecture [16]

LLPS Assessment

LLPS prediction Theory: how is it measured? (i)

Methods to study LLPS Alberti et al., [7]

LLPS prediction Theory: how is it measured (ii) ?

Photo bleaching experiment [17]

LLPS prediction Theory: how could it be predicted? (ii)

Is it reasonable to expect good predictions?

  • Disorder is predicable.
  • Sequence properties and PPIs play a large role.
  • The specific celluar conditions are also important.

Properties of LLPS proteins [18]

Results

The value of predictions

Protein Complex predictions

The human trifunctional protein

AF3 prediction (TM: 0.98515)

Combfold prediction (TM: 0.9767)
Figure 2: Predictions of the human tri-functional protein (6DV2)

Glycolysis – ML prediction

Combfold prediction

Subunit map
Figure 3: Combfold prediction of the Athal Glycolysis metabolon

Glycolysis — Previous homology prediction

A previous I-Tasser and docking prediction made by [19]

LLPS classifiers

DrLLPS (i)

PAICS

DrLLPS (ii)

Literature annotations

Condensate Description Tissue/Cell PMIDs
Postsynaptic density “…We report the first direct comparison of the proteome of triplicate isolates of mouse and human cortical postsynaptic densities. The mouse postsynaptic density comprised 1556 proteins and the human one 1461.” Human cortex 23071613

Performance of classifiers

ROC curve for the LLPS classifiers

Discussion

Take aways

What did we learn?

In summary

  • Machine learning based structure prediction:
    • Can be useful.
    • Is often fickle.
  • ML datasets should be trustworthy

How well can we trust nature?

Me quite often

What improvements can be made?

Direct contact metabolons

  • Improvements to Combfold
    • Allow for chance
    • Inference of stoichiometry.
  • Where can we get data from:
    • Infer complexes from STRING-DB or Complex portal
    • Stoichiometry maybe be found by looking for similar complexes (maybe with foldseek).

Cluster channeled metabolons (i)

  • Understanding Enzymes and LLPS
  • What is the composition of a liquid phase?
    • Maybe for our research question PPI networks are more relevant.

Cluster channeled metabolons (ii)

Example of MOBIDB entry for FUS (mobidb entry P35637)

What still needs to happen?

Reproducibility must be budgeted

Direct contact metabolons

  • When we have a structure, what can we do?
    • Infer a biological mechanism.
    • Infer a biological channel (ChannelDB)
    • Simulate channelling.

Cluster channeled metabolons

  • Is it possible to reproduce experimental findings?

  • Are channelled metabolons, such as the purinosome, conserved?

Conclusion

  • Take home message

  • Time for questions

  • Acknowledgements

  • AG-Walther

Take home message

In some cases, structures can be predicted.

  • Whether the investment in computation time is worth it, is a different question…

The end

Acknowledgements

Dirk Walther
The good discussions on how to work as a researcher, and providing the organisation and stipend for me to work at his department.
Erasmus organisation
Providing a stipend for me to travel to Brandenburg.
Hanne Zillmer
Usefull discussions on structural biology and advice for data analysis methods.

Further information

  • Backup slides

  • List of products

  • Used references

Backup slides

What is a TM score?

\[ f(M, N)_{ij} = \sum^M_{m=1}\sum^N_{n=1} {1 \over {1 + \left({{d_{ij}(m,n)} \over d_0}\right)^2}}\]

where

\[d_0 = 1.24\sqrt[3]{L - 15} - 1.8\]

Pro-Con list per tool

Tool Pros Cons
Combfold - Can be steered with biological information by the user
- No token limit
- Does not work with ligants
- Often produces clashes.
AlphaFold3 - Often accurate
- easy to use
- Works with ligants
- Nonpermissive licence.
- token limit of 5000

How does Combfold work?

Take input subunit sequences

Predict pairwise structures

Assembly

Comfold limitation and interpretation

Two sameish complexes with widely different combfold scores (68 vs 90), TM score = 90.

Internship products

Datasets
A zenodo record of my internship work.
Structural analysis toolkit
Snakemake implementation of TM scores using the TM, MM and USalign programmes; PLIP for protein ligand bonds; and the generation of subunit graphs with contact maps.
Combfold pipeline
A Snakemake pipeline to predict protein complexes using Combfold and AF2M.
this presentation
Made using the quarto clean theme and reveal-header.

Cited works

1.
Pareek V, Sha Z, He J, Wingreen NS, Benkovic SJ (2021) Metabolic channeling: Predictions, deductions, and evidence. Molecular Cell 81(18):3775–3785. https://doi.org/10.1016/j.molcel.2021.08.030
2.
Welch GR (1978) On the role of organized multienzyme systems in cellular metabolism: A general synthesis. Progress in Biophysics and Molecular Biology 32:103–191. https://doi.org/10.1016/0079-6107(78)90019-6
3.
Srere PA (1987) COMPLEXES OF SEQUENTIAL METABOLIC ENZYMES. Annual Review of Biochemistry 56:89–124. https://doi.org/10.1146/annurev.bi.56.070187.000513
4.
Busch F, Rajendran C, Heyn K, Schlee S, Merkl R, Sterner R (2016) Ancestral Tryptophan Synthase Reveals Functional Sophistication of Primordial Enzyme Complexes. Cell Chemical Biology 23(6):709–715. https://doi.org/10.1016/j.chembiol.2016.05.009
5.
Bulutoglu B, Garcia KE, Wu F, Minteer SD, Banta S (2016) Direct Evidence for Metabolon Formation and Substrate Channeling in Recombinant TCA Cycle Enzymes. ACS Chem Biol 11(10):2847–2853. https://doi.org/10.1021/acschembio.6b00523
6.
Gao Z, Zhang W, Chang R, Zhang S, Yang G, Zhao G (2021) Liquid-Liquid Phase Separation: Unraveling the Enigma of Biomolecular Condensates in Microbial Cells. Front Microbiol 12. https://doi.org/10.3389/fmicb.2021.751880
7.
Alberti S, Gladfelter A, Mittag T (2019) Considerations and Challenges in Studying Liquid-Liquid Phase Separation and Biomolecular Condensates. Cell 176(3):419–434. https://doi.org/10.1016/j.cell.2018.12.035
8.
Abramson J, Adler J, Dunger J, et al (2024) Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 1–3. https://doi.org/10.1038/s41586-024-07487-w
9.
Qiao Z, Nie W, Vahdat A, Miller TF, Anandkumar A (2024) State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat Mach Intell 6(2):195–208. https://doi.org/10.1038/s42256-024-00792-z
10.
Shor B, Schneidman-Duhovny D (2024) CombFold: Predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat Methods 21(3):477–487. https://doi.org/10.1038/s41592-024-02174-0
11.
Krishna R, Wang J, Ahern W, et al (2024) Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384(6693):eadl2528. https://doi.org/10.1126/science.adl2528
12.
Gao M, Nakajima An D, Parks JM, Skolnick J (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13(1):1744. https://doi.org/10.1038/s41467-022-29394-2
13.
Bryant P, Pozzati G, Zhu W, Shenoy A, Kundrotas P, Elofsson A (2022) Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat Commun 13(1):6028. https://doi.org/10.1038/s41467-022-33729-4
14.
Chim HY, Elofsson A (2024) MoLPC2: Improved prediction of large protein complex structures and stoichiometry using Monte Carlo Tree Search and AlphaFold2. Bioinformatics 40(6):btae329. https://doi.org/10.1093/bioinformatics/btae329
15.
Pramoditha R (2022) Overview of a Neural Network’s Learning Process. https://medium.com/data-science-365/overview-of-a-neural-networks-learning-process-61690a502fa. Accessed 25 Jul 2024
16.
17.
Wei S-P, Qian Z-G, Hu C-F, et al (2020) Formation and functionalization of membraneless compartments in Escherichia coli. Nat Chem Biol 16(10):1143–1148. https://doi.org/10.1038/s41589-020-0579-9
18.
Yang S, Shen W, Hu J, et al (2023) Molecular mechanisms and cellular functions of liquid-liquid phase separation during antiviral immune responses. Front Immunol 14. https://doi.org/10.3389/fimmu.2023.1162211
19.
Zhang Y, Sampathkumar A, Kerber SM-L, et al (2020) A moonlighting role for enzymes of glycolysis in the co-localization of mitochondria and chloroplasts. Nat Commun 11(1):4509. https://doi.org/10.1038/s41467-020-18234-w
20.
Zhang C, Shine M, Pyle AM, Zhang Y (2022) US-align: Universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 19(9):1109–1115. https://doi.org/10.1038/s41592-022-01585-1