Combining machine learning and polymer physics simulations to study the mechanisms of large-scale genome organisation

Supervisor: Dr. Nicolas Battich

Enrolled since 2025

In mammalian cells, chromatin forms highly dynamic and complex structures spanning different length scales, from nucleosomes through topologically-associated domains to chromosome territories. These structures, and the biophysical mechanisms which drive them, play a significant role in regulating gene transcription. Despite that, large-scale organisation remains understudied in comparison to finer-scale structures. The spatial genome-wide interactions can be probed with chromosome-conformation-capture techniques such as HiC. While informative, these experiments are time-consuming and expensive to conduct, and the data can be difficult to interpret. Computer modelling offers a complimentary avenue to increase understanding about the large-scale structural mechanisms, and this is the focus of the PhD project. The main goal is to develop an accurate and computationally efficient chromatin model for large-scale genome organisation (at the level of A/B compartments and chromosome territories), which would be applicable to many mammalian cell types. Subsequently, I aim to use the model to study the mechanisms governing these structures.


Molecular dynamics simulations are implemented to obtain mechanistic insight about the genomic changes. Since multi-chromosome systems are too large to be represented at an atomic detail in the model, the chromatin is simplified into a chain of monomers, each corresponding to 100 000 base pairs (for context, topologically-associated domains are usually 40 kbp - 3 Mbp long). Since coarse-graining decreases the model accuracy, an appropriate monomer type assignment method is introduced to counteract the loss in detail. In Hi-C, it is observed that at large scales different chromatin segments tend to segregate into a number of groups. My polymer model aims to replicate these by assigning the monomers with a corresponding group type derived from the Hi-C maps. The majority of the type assignment algorithms either rely only on inter- or intrachromosomal interactions, and very few of them use a labelling system that is consistent across cell types. Spectral clustering has been shown to overcome these limitations, so it is being validated in my work. However, epigenetic data (i.e. ChIP-seq of most common histone modifications) are more ubiquitous and easier to generate than Hi-C. Additionally, types derived from Hi-C have been shown to exhibit distinct chromatin modification properties, so a machine learning (ML) classifier will be developed to predict monomer labels based on histone modification ChIP-seq as input. It will help to make the model applicable to many cell types. The monomer labels serve as input into the polymer model. They also define parameter values in the potential function governing the strength of the interactions between the monomer types. Since the number of combinations of type-specific values is very large, a simple random search would take a long time to find parameters that match the experimental structures closely. Hence, a learning method is being implemented and validated to obtain accurate energy parameter values. In such a procedure, chromatin MD simulations are run and compared to Hi-C. Then based on the discrepancies between the simulated and experimental contact maps, parameter values are updated with the guidance of a Bayesian optimiser to start new polymer simulations. These steps are repeated until the error size between the experiment and simulations is sufficiently small.


To complement the model development, I will make a research visit at the Battich Lab in Helmholtz Munich. The group is aiming to improve an already existing method, SPRITE, toincrease its efficiency and resolution. The technique will be used to probe 3D chromatin organisation during early cell development. The improvement is made by introducing a proximity ligation method from Hi-C. During the visit, I will conduct the bioinformatics analysis of Hi-C data from the scientific literature and SPRITE results generated by the lab, which will then later be used as training data for the polymer model. Then in collaboration with the Munich group, the polymer model will be applied to study large-scale chromatin changes during cell differentiation, eg. during the exit of pluripotency of human embryonic stem cells.