NSF IIS/III Algorithms and Software Tools for Epigenetics Research #1302134
News (last update: September 2017)
Project Goals and Research Challenges
This project will develop a new computational framework to advance the understanding of epigenetic gene regulation in the human malaria parasite. Epigenetics is the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence.At the core of the computational framework is the ability to solve a set of hard computational questions, which are the focus of the research plan. The computational challenges require the study of novel combinatorial optimization problems, the development of new time- and space-efficient algorithms, and ultimately the implementation and deployment of user-friendly web-based software tools.
Most eukaryotic genomes have a second layer of information which is embedded on chemical marks added to DNA and to the protruding tail of special proteins that package DNA into a complex called the nucleosome. One of the most astonishing discoveries in molecular biology of the past decades is that this "covert" layer, called the epigenome, affects a variety of cellular and metabolic processes. Epigenetic marks not only controls what genes are accessible in each type of cell, but also determine when the accessible genes may be activated. Molecular biologists have also confirmed that the epigenome is affected by the interactions of the organism with the environment and that changes to the epigenetic marks induced by these interactions are inherited across cell division, despite not being encoded directly in DNA.
This project will study a set of computational challenges that will be brought about by the increasing number of epigenome projects. Specifically, the goal is to develop methods and software tools for (1) the analysis nucleosome and methylation maps(using a modified Gaussian mixture model and expectation maximization); (2) the study of dynamics of nucleosome positioning, histone tail modifications and DNA methylation patterns (using graph theoretical approaches, e.g., k-partite matching); (3) the analysis of DNA motifs for stable nucleosomes and specific histone modifications (using combinatorial optimization approaches); (4) the discovery of new genes using nucleosome or methylation landscapes (using machine learning classifiers); (5) the identification of statistically significant genome-wide correlations between nucleosome positioning, histone modifications, DNA methylation patterns and gene expression (using dynamic Bayesian networks). These five computational tasks will require the study of novel combinatorial optimization and machine learning problems, the development of new time- and space-efficient algorithms, and ultimately the implementation and deployment of user-friendly web-based software tools.
The "platform" on which the algorithms will be developed is P. falciparum, the parasite responsible each year for 350-500 million cases of malaria, and between one and three million of human deaths world-wide. There is no vaccine against malaria (one is currently on clinical trials) and the parasite is developing resistances to almost all drugs currently available. The methods and tools developed will not be malaria-specific, and will scale to a variety of other eukaryota with much larger/complex genomes.
The ability to analyze the epigenome of the human malaria parasite will improve our comprehension of its biology and possibly enable molecular biologists to identify new antimalarial strategies. The proposed computational framework will also enable life scientists to make novel epigenetic discoveries and ultimately improve the understanding of the complex mechanisms that drive gene expression inother eukaryotic organisms. Software tools will be placed into the public domain, which will benefit researchers and the public worldwide, and potentially lead to new international and industrial collaborations. This project will support two graduate students and one post-doc in a highly interdisciplinary environment.
- Prof. Stefano Lonardi, PI (Computer Science)
- Prof. Karine Le Roch, coPI (Cell Biology and Neuroscience)
- Evelien Bunnik, post-doc (Cell Biology and Neuroscience)
- Anton Polishko, PhD (Computer Science)
- Weihua Pan, PhD student (Computer Science)
- Md. Abid Hasan, PhD student (Computer Science)
- Abbas Roayaei, PhD student (Computer Science)
-  N. Ponts, L. Fu, E. Y. Harris, J. Zhang, D.-W. D. Chung, E. Bunnik, M. C. Cervantes, J. Prudhomme, V. Atanasova-Penichon, E. Zehraoui, E. M. Rodrigues, S. Lonardi, G. R. Hicks, Y. Wang, K. G. Le Roch, "Genome-scale discovery of DNA methylations in the human malaria parasite", Cell Host and Microbe, 14(6): 696-706, 2013.
-  E. M. Bunnik, A. Polishko, J. Prudhomme, N. Ponts, S. S. Gill, S. Lonardi, K. G. Le Roch, "DNA-encoded nucleosome occupancy is associated with transcription levels in the human malaria parasite Plasmodium falciparum." BMC Genomics, 15:347, 2014.
-  H. Chen, S. Lonardi, J. Zheng, "Deciphering Histone Code of Transcriptional Regulation in Malaria Parasites by Large-scale Data Mining", Computational Biology and Chemistry, 50: 3-10, 2014. Presented at Advances in Bioinformatics: Twelfth Asia Pacific Bioinformatics Conference (APBC2014).
-  Polishko, A. and Bunnik, E. M. and Le Roch, K. G. and Lonardi, S.. "PuFFIN: A Parameter-free Method to Build Nucleosome Maps from Paired-end Reads," BMC Bioinformatics, v.15, 2014, p. S11.
-  X. M. Lu, E. M. Bunnik, N. Pokhriyal, S. Nasseri, S. Lonardi, K. G. Le Roch, "Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum", BMC Genomics, 16 1005, 2015.
-  A. Polishko, M. A. Hasan, W. Pan, E. Bunnik, K. L. Roch, S. Lonardi, "ThIEF: Finding Genome-wide Trajectories of Epigenetics Marks", Proceedings of the Workshop on Algorithms in Bioinformatics (WABI'17), Boston, MA, 2017
- This material is based upon work supported by the National Science Foundation under Grant No. 1302134