Statistical-physics inspired modelling of protein sequences

Martin Weigt (UPMC) — May 26, 2016

Over the last years, biological research has been revolutionised by experimental high-throughput techniques. Unprecedented amounts of data are accumulating, causing an urgent need to develop computational modelling approaches to unveil information hidden in raw data, and to help to increase our understanding of complex biological systems. Inference approaches based on statistical physics have played an important role across diverse systems ranging from proteins over neural networks to the collective behaviour of animal groups.

To give a specific example, proteins show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino-acid sequences. Thanks to modern sequencing techniques, this amino-acid variability is easily observable, contrary to time- and labour-intensive experiments determining, e.g., the three-dimensional fold of a protein. I will present recent developments around the so-called Direct-Coupling Analysis [1,2], a statistical-mechanics inspired inference approach, which links sequence variability to protein structure and function. I will show that this methodology can be used to (i) to infer contacts between residues and thus to guide 3D-structure prediction of proteins and their complexes [3], (ii) to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the effect of mutations [4]. Beyond a direct bioinformatic interest of such findings, they provide us also insight into underlying principles connecting protein evolution, structure and function.

[1] M. Weigt, R.A. White, H. Szurmant, J.A. Hoch, T. Hwa, "Identification of direct residue contacts in protein-protein interaction by message passing", Proc. Natl. Acad. Sci. 106, 67 (2009).
[2] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, "Direct-coupling analysis of residue co-evolution captures native contacts across many protein families", Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
[3] J.I. Sułkowska, F. Morcos, M. Weigt, T. Hwa, J.N. Onuchic, "Genomics-Aided Structure Prediction", Proc. Natl. Acad. Sci. 109, 10340-10345 (2012).
[4] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt "Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1", Mol. Biol. Evol. (2015), doi : 10.1093/molbev/msv211

Biography :
Martin Weigt works as a Professor at the Laboratoire de Biologie Computationnnelle et Quantitative at the UPMC, Paris, where he has built up the team "Statistical Genomics and Biological Physics". Based on his original formation in theoretical statistical physics, he is developing statistical inference and modelling approaches for bio-molecular data.

You can also watch this video on the multimedia site ENS : savoirs.ens.fr