Project (Johannes Soeding)


Modern methods for large-scale protein and RNA structure and function analysis

Our lab works on the development of methods and software for large-scale protein and DNA sequence analyses. Popular tools from our lab are HHpred and HH-suite [1] for protein structure prediction and remote homology detection, MMseq2 [2] for large-scale sequence searches, linclust [3] for linear-time sequence clustering, Plass and PenguiN for sequence assembly, and Foldseek [4] for fast protein structure searches. In the next years, our focus is on developing tools (1) for annotating protein space by analysing the conservation of genomic neighborhood (e.g. [5]); (2) for the discovery and ordering of the space of structured RNAs by fast sequence and structure similarity searches; and (3) for phylogeny analysis using a novel approach based on the quasi-neutral theory of evolution.

If you love programming and machine learning or statistics and have an interest in solving challenging problems related to biology, we would like to hear from you.




Homepage Research Group
http://www.mpinat.mpg.de/de/soeding

Google Scholar profile
https://scholar.google.de/citations?user=xjYIe80AAAAJ



For more information see for instance:

  • [1] Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods. 2012. 9(2):173-5.
  • [2] Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnol. 2017. 35(11):1026-8.
  • [3] Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nature Commun. 2018. 299(1):2542.
  • [4] Van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CL, Söding J, Steinegger M. Fast and accurate protein structure search with Foldseek. Nature Biotechnol. 2024. 42(2):243-6.
  • [5] Zhang R, Mirdita M, Soeding J. De novo discovery of conserved gene clusters in microbial genomes with Spacedust. bioRxiv. 2024:2024-10.