Computer Science and Engineering

NSF IIS/III Algorithms for Genome Assembly of Ultra-deep Sequencing Data #1526742

News (last update June 2, 2016)

  • Jun 2016: Submitted paper [4]
  • May 2016: Submitted paper [5]: under review!
  • Oct 2015: Published paper [3]
  • Aug 2015: Published papers [1,2]
  • Aug 2015: Official start of the project
  • Jul 2015: Prof. Lonardi presents "De Novo Meta-Assembly of Ultra-deep Sequencing Data Bioinformatics" at ISMB'16 in Dublin, Ireland
  • Project Goals and Research Challenges

    The University of California, Riverside is awarded a grant to investigate the computational challenges that will brought upon by the analysis of ultra-deep sequencing data (i.e., coverage 1000x or higher), specifically in the context of de novo genome assembly. As sequencing cost continues to decrease, ultra-deep sequencing data will become more common, but the problem of de novo genome assembly remains computationally challenging, in particular for large, repetitive genomes. Since the sequencing of H. influenzae in 1995, the assembly problem has been characterized by limited depth of sequencing coverage mostly due to the high cost of generating the data. This project will investigate for the first time the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Deliverables will include novel software tools for genome assembly which will benefit researchers and the public worldwide, and potentially lead to new international and industrial collaborations.

    As said, the research plan is aimed at de novo assembly problem under the assumption that the input sequencing data is ultra-deep. The study will demonstrate that when the depth of sequencing increases over a certain threshold, sequencing errors make the genome assembly problem harder and harder, and as a consequence the quality of the solution degrades with more and more data. The project will show that modern de novo assemblers like SPAdes, IDBA-ud, and Velvet are unable take advantage of ultra-deep sequencing data. The research plan will deal with ultra-deep sequencing data using a divide-and-conquer approach. In our proposed meta-assembler, the input data will be partitioned into optimal-sized "slices" and a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) will be used to assemble each slice individually. For the de novo assembler, a set of de Bruijn graphs will be created, each one built form the sequencing data of a slice. In both cases, a majority voting strategy among the individual assemblies/graphs will be used to generate a high-quality consensus assembly.

    Broader Impacts

    This project will directly support two graduate students in a highly interdisciplinary environment, building on UCR's strengths in Computer Science and Agricultural Sciences. Undergraduates will have opportunities to participate in research through a Research Experiences for Undergraduates (REU) site at UCR, a collaboration with a nearby community college, and a new US Department of Education Title V Hispanic Serving Institution grant (UCR is an accredited HSI).



    • [1] S. Lonardi, H. Mirebrahim, S. Wanamaker, M. Alpert, G. Ciardo, D. Duma, T. J. Close, "When Less is More: Slicing Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality", Bioinformatics, vol.31, no.18, pp. 2972- 2980, 2015.
    • [2] H. Mirebrahim, T. J. Close, S. Lonardi, "De Novo Meta-Assembly of Ultra-deep Sequencing Data Bioinformatics", Bioinformatics, vol.31, no.12, i9-i16, 2015. Also in Proceedings of Conference on Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB'15), Dublin, Ireland, 2015.
    • [3] M. Munoz-Amatriain*, S. Lonardi*, M-C. Luo, K. Madishetty, J. T. Svensson, M. J. Moscou, S. Wanamaker, T. Jiang, A. Kleinhofs, G. J. Muehlbauer, R. P. Wise, N. Stein, Y. Ma, E. Rodriguez, D. Kudrna, P. R. Bhat, S. Chao, P. Condamine, S. Heinen, J. Resnik, R. Wing, H. N Witt, M. Alpert, M. Beccuti, S. Bozdag, F. Cordero, H. Mirebrahim, R. Ounit, Y. Wu, F. You, J. Zheng, H. Simkova, J. Dolezel, J. Grimwood, J. Schmutz, D. Duma, L. Altschmied, T. Blake, P. Bregitzer, L. Cooper, M. Dilbirligi, A. Falk, L. Feiz, A. Graner, P. Gustafson, P. M. Hayes, P. Lemaux, J. Mammadov, T. J. Close (* equal contributors), "Sequencing of 15,622 Gene-bearing BACs Clarifies the Gene-dense Regions of the Barley Genome", The Plant Journal, 84(1): 216-227, 2015.
    • [4] M. Munoz-Amatriain, H. Mirebrahim, P. Xu, S. Wanamaker, M.C. Luo, H. Alhakami, M. Alpert, I. Atokple, J. Batieno, O. Boukar, S. Bozdag, N. Cisse, I. Drabo, J. D. Ehlers, A. Farmer, C. Fatokun, Yong Q. Gu, Y.-N. Guo, B.-L. Huynh, S. A. Jackson, F. Kusi, M. R. Lucas, Yaqin Ma, M. P. Timko, J. Wu, F. You, P. A. Roberts, S. Lonardi and T. J. Close, "Genome resources for climate-resilient cowpea, an essential crop for food security", submitted, 2016.
    • [5] H. Alhakami, H. Mirebrahim and S. Lonardi, "A Comparative Evaluation of Assembly Reconciliation Tools", submitted, 2016.


    • This material is based upon work supported by the National Science Foundation under Grant No.1526742

    Point of Contact


    University of California, Riverside
    Winston Chung Hall, room 325
    Riverside, CA 92521
    Tel: (951) 827-2203
    Fax: (951) 827-4643