NSF IIS/III Algorithms for Genome Assembly of Ultra-deep Sequencing Data #1526742
News (last update June, 2018)
Project Goals and Research Challenges
The University of California, Riverside is awarded a grant to investigate the computational challenges that will brought upon by the analysis of ultra-deep sequencing data (i.e., coverage 1000x or higher), specifically in the context of de novo genome assembly. As sequencing cost continues to decrease, ultra-deep sequencing data will become more common, but the problem of de novo genome assembly remains computationally challenging, in particular for large, repetitive genomes. Since the sequencing of H. influenzae in 1995, the assembly problem has been characterized by limited depth of sequencing coverage mostly due to the high cost of generating the data. This project will investigate for the first time the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Deliverables will include novel software tools for genome assembly which will benefit researchers and the public worldwide, and potentially lead to new international and industrial collaborations.
As said, the research plan is aimed at de novo assembly problem under the assumption that the input sequencing data is ultra-deep. The study will demonstrate that when the depth of sequencing increases over a certain threshold, sequencing errors make the genome assembly problem harder and harder, and as a consequence the quality of the solution degrades with more and more data. The project will show that modern de novo assemblers like SPAdes, IDBA-ud, and Velvet are unable take advantage of ultra-deep sequencing data. The research plan will deal with ultra-deep sequencing data using a divide-and-conquer approach. In our proposed meta-assembler, the input data will be partitioned into optimal-sized "slices" and a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) will be used to assemble each slice individually. For the de novo assembler, a set of de Bruijn graphs will be created, each one built form the sequencing data of a slice. In both cases, a majority voting strategy among the individual assemblies/graphs will be used to generate a high-quality consensus assembly.
This project will directly support two graduate students in a highly interdisciplinary environment, building on UCR's strengths in Computer Science and Agricultural Sciences. Undergraduates will have opportunities to participate in research through a Research Experiences for Undergraduates (REU) site at UCR, a collaboration with a nearby community college, and a new US Department of Education Title V Hispanic Serving Institution grant (UCR is an accredited HSI).
-  S. Lonardi, H. Mirebrahim, S. Wanamaker, M. Alpert, G. Ciardo, D. Duma, T. J. Close, "When Less is More: Slicing Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality", Bioinformatics, vol.31, no.18, pp. 2972- 2980, 2015.
-  H. Mirebrahim, T. J. Close, S. Lonardi, "De Novo Meta-Assembly of Ultra-deep Sequencing Data Bioinformatics", Bioinformatics, vol.31, no.12, i9-i16, 2015. Also in Proceedings of Conference on Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB'15), Dublin, Ireland, 2015.
-  M. Munoz-Amatriain*, S. Lonardi*, M-C. Luo, K. Madishetty, J. T. Svensson, M. J. Moscou, S. Wanamaker, T. Jiang, A. Kleinhofs, G. J. Muehlbauer, R. P. Wise, N. Stein, Y. Ma, E. Rodriguez, D. Kudrna, P. R. Bhat, S. Chao, P. Condamine, S. Heinen, J. Resnik, R. Wing, H. N Witt, M. Alpert, M. Beccuti, S. Bozdag, F. Cordero, H. Mirebrahim, R. Ounit, Y. Wu, F. You, J. Zheng, H. Simkova, J. Dolezel, J. Grimwood, J. Schmutz, D. Duma, L. Altschmied, T. Blake, P. Bregitzer, L. Cooper, M. Dilbirligi, A. Falk, L. Feiz, A. Graner, P. Gustafson, P. M. Hayes, P. Lemaux, J. Mammadov, T. J. Close (* equal contributors), "Sequencing of 15,622 Gene-bearing BACs Clarifies the Gene-dense Regions of the Barley Genome", The Plant Journal, 84(1): 216-227, 2015.
-  M. Munoz-Amatriain, H. Mirebrahim, P. Xu, S. Wanamaker, M.C. Luo, H. Alhakami, M. Alpert, I. Atokple, J. Batieno, O. Boukar, S. Bozdag, N. Cisse, I. Drabo, J. D. Ehlers, A. Farmer, C. Fatokun, Yong Q. Gu, Y.-N. Guo, B.-L. Huynh, S. A. Jackson, F. Kusi, M. R. Lucas, Yaqin Ma, M. P. Timko, J. Wu, F. You, P. A. Roberts, S. Lonardi and T. J. Close, "Genome resources for climate-resilient cowpea, an essential crop for food security", The Plant Journal, 89(5): 1042-1054, 2016.
-  S. Beier, A. Himmelbach, C. Colmsee, X.-Q. Zhang, R. A. Barrero, Q. Zhang, L. Li, M. Bayer, D. Bolser, S. Taudien, M. Groth, M. Felder, A. Hastie, H. Simkova, H. Stankova, J. Vrana, S. Chan, M. Munoz-Amatriain, R. Ounit, S. Wanamaker, T. Schmutzer, L. Aliyeva-Schnorr, S. Grasso, J. Tanskanen, D. Sampath, D. Heavens, S. Cao, B. Chapman, F. Dai, Y. Han, H. Li, X. Li, C. Lin, J. K. McCooke, C. Tan, S. Wang, S. Yin, G. Zhou, J. A. Poland, M. I. Bellgard, A. Houben, J. Dolezel, S. Ayling, S. Lonardi, P. Langridge, G. J. Muehlbauer, P. Kersey, M. D. Clark, M. Caccamo, A. H. Schulman, M. Platzer, T. J. Close, M. Hansson, G. Zhang, I. Braumann, C. Li, R. Waugh, U. Scholz, N. Stein, M. Mascher, "Construction of a Map-based Reference Genome Sequence for Barley, Hordeum vulgare L.", Scientific Data (Nature), 4: 170044, 2017.
-  M. Mascher, H. Gundlach, A. Himmelbach, S. Beier, S. O. Twardziok, T. Wicker, V. Radchuk, C. Dockter, P. E. Hedley, J. Russell, M. Bayer, L. Ramsay, H. Liu, G. Haberer, X.-Q. Zhang, Q. Zhang, R. A. Barrero, L. Li, S. Taudien, M. Groth, M. Felder, A. Hastie, H. Simkova, H. Stankova, J. Vrana, S. Chan, M. Munoz-Amatriain, R. Ounit, S. Wanamaker, D. Bolser, C. Colmsee, T. Schmutzer, L. Aliyeva-Schnorr, S. Grasso, J. Tanskanen, A. Chailyan, D. Sampath, D. Heavens, L. Clissold, S. Cao, B. Chapman, F. Dai, Y. Han, H. Li, X. Li, C. Lin, J. K. McCooke, C. Tan, P. Wang, S. Wang, S. Yin, G. Zhou, J. A. Poland, M. I. Bellgard, L. Borisjuk, A. Houben, J. Dolezel, S. Ayling, S. Lonardi, P. Kersey, P. Langridge, G. J. Muehlbauer, M. D. Clark, M. Caccamo, A. H. Schulman, K.F.X. Mayer, M. Platzer, T. J. Close, U. Scholz, M. Hansson, G. Zhang, I. Braumann, M. Spannagl, C. Li, R. Waugh, N. Stein, "A Chromosome Conformation Capture Ordered Sequence of the Barley Genome", Nature, 544: 427-433, 2017.
-  H. Alhakami, H. Mirebrahim and S. Lonardi, "A Comparative Evaluation of Assembly Reconciliation Tools", Genome Biology, 18: 93, 2017.
-  A. B. R. McIntyre, R. Ounit, E. Afshinnekoo, R. J. Prill, E. Henaff, N. Alexander, S. S. Minot, D. Danko, J. Foox, S. Ahsanuddin, S. Tighe, N. A. Hasan, P. Subramanian, K. Moffat, S. Levy, S. Lonardi, N. Greenfield, R. R. Colwell, G. L. Rosen, C. E. Mason, Comprehensive Benchmarking and Ensemble Approaches for Metagenomic Classifiers", Genome Biology, 18: 182, 2017, 2017.
-  A. Polishko, Md. A. Hasan, W. Pan, E. M. Bunnik, K. Le Roch, S. Lonardi, "ThIEF: Finding Genome-wide Trajectories of Epigenetics Marks", Proceedings of WABI 2017 - Workshop on Algorithms in Bioinformatics, 19:1-19:16, Boston, MA, 2017.
-  A. R. Ardakany and S. Lonardi, Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps. Proceedings of WABI 2017 - Workshop on Algorithms in Bioinformatics, 22:1-22:11, Boston, MA, 2017.
-  B.-L. Huynh, J. D. Ehlers, B. E. Huang, M. Munoz-Amatriain, S. Lonardi, J. R. P. Santos, A. Ndeve, B. J. Batieno, O. Boukar, N. Cisse, I. Drabo, C. Fatokun, F. Kusi, R. Y. Agyare, Y.-N. Guo, I. Herniter, S. Lo, S. I. Wanamaker, S. Xu, T. J. Close, P. A. Roberts, A multi-parent advanced generation inter-cross (MAGIC) population for genetic analysis and improvement of cowpea (Vigna unguiculata L. Walp.) The Plant Journal, 93(6): 1129-1142, 2018.
- This material is based upon work supported by the National Science Foundation under Grant No.1526742