CS 234, W15: Computational Methods for the Analysis of Biomolecular Data


  • (Mar 9) Homework 3 solution posted
  • (Feb 18) Homework 3 posted
  • (Feb 18) Homework 2 solution posted
  • (Jan 31) Homework 2 posted
  • (Jan 15) Homework 1 posted
  • Overview

    An impressive wealth of data has being ammassed by genome sequencing projects and other efforts to determine the structures and functions of biological systems. This advanced graduate course will focus on a selection of computational problems aimed at automatically analyze, cluster and classify biomolecular data.

    Class Meeting

    TR, 2:10 p.m. - 3:30 p.m. CHUNG 139

    Office hours

    Open door policy or by appointment (email me)

    Preliminary list of topics

  • overview on probability and statistics
  • intro to molecular and computational biology
  • analysis of 1D sequence data (DNA, RNA, proteins)
  • Space-efficient data structures for sequences
  • Short read mapping (suffix trees, suffix arrays, BWT)
  • Sequence alignment and hidden Markov models (HMM)
  • analysis of 2D data (gene expression data and graphs)
  • clustering algorithms
  • classification algorithms
  • subspace clustering/bi-clustering
  • genetic networks, co-expression networks, metabolic networks, protein-protein interaction graphs
  • Prerequisites

    CS141 (Algorithms) or CS218 (Design and Analysis of Algorithms) or equivalent knowledge. Some programming experience is expected. Students should have some notions of probability and statistics. No biology background is assumed.

    Course Format

    The course will include lectures by the instructor, guest lectures, and possibly discussion sessions on special problems. Students are expected to study the material covered in class. In addition to selected chapters from some of the books listed below, there may be handouts of research papers. There will be three/four assignments, mostly of theoretical nature -- although some may require programming. The actual format of the course will ultimately depend on the number and the background of the students enrolled.

    Relation to Other Courses

    This course is intended to complement "CS238: Algorithms in Computational Molecular Biology", and "CS235: Data Mining Concepts".

    References (books)

  • Richard Durbin, A. Krogh, G. Mitchison, and S. Eddy, Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
  • Dan Gusfield, Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology, Cambridge University Press, 1997.
  • Dan E. Krane, Michael L. Raymer, Fundamental Concepts of Bioinformatics, Benjamin Cummings 2002
  • Warren J. Ewens, Gregory R. Grant, Statistical Methods in Bioinformatics: An Introduction, Springer, 2001
  • An Introduction to Bioinformatics Algorithms, Neil C. Jones and Pavel Pevzner, the MIT Press, 2004.
  • Understanding Bioinformatics, Marketa Zvelebil, Jeremy O. Baum, Garland Science, 2007
  • References (papers)

  • Anders Krogh, "An introduction to hidden Markov models for biological sequences" [PDF format]
  • Paolo Ferragina, Giovanni Manzini, "Opportunistic Data Structures with Applications", FOCS 2000 [PDF format]
  • Jeremy Buhler, Uri Keich, Yanni Sun, "Designing Seeds for Similarity Search in Genomic DNA", RECOMB 2003 [PDF format]
  • Avak Kahvejian, John Quackenbush, John F Thompson, "What would you do if you could sequence everything?", Nature Biotechnology, 2008 [PDF format]
  • Michael L. Metzker, "Sequencing technologies - the next generation", Nature Reviews Genetics, 2010 [PDF format]
  • Slides

  • Slides [PDF Format 2slides/page] (Course Overview)
  • Slides [PDF Format 2slides/page] (Intro to Mol Biology)
  • Slides [PDF Format 2slides/page] (Mol Biology Tools)
  • Slides [PDF Format 2slides/page] (Indexing and Searching)
  • Slides [PDF Format 2slides/page] (Probability Models and Inference)
  • Resources

  • RNAi animation (Nature Genetics)
  • The inner life of a Cell
  • DNA Molecular animation
  • A bioinformatics glossary
  • What's a Genome (on-line book)
  • DNA interactive
  • Experimental Genome Science (on-line course)
  • Current Topics in Genome Analysis 2014 (on-line course)
  • Fundamentals of Biology (on-line course)
  • Pevzner's bioinformatics courses (on-line)
  • Projects

  • Project ideas and rules
  • Sepideh Azarnoosh's CS 234 project webpage
  • Kazi Islam's CS 234 project webpage
  • Weihua Pan's CS 234 project webpage
  • Leo Phong Vu's CS 234 project webpage
  • Sawyer Masonjones's CS 234 project webpage
  • Albert Do's CS 234 project webpage
  • Suhas Sureshchandra's CS 234 project webpage
  • Ashraful Arefeen's CS 234 project webpage
  • Xing (Vic) Zhang's CS 234 project webpage
  • Md. Abid Hasan's CS 234 project webpage
  • Chetas Manjunath's CS 234 project webpage
  • Yang Liu's CS 234 project webpage
  • Abbas Roayaei Ardakany's CS 234 project webpage
  • Homework

  • Homework 1 (posted Jan 15, due Jan 29)
  • Homework 2 (posted Jan 31, due Feb 17)
  • Homework 2 solution
  • Homework 3 (posted Feb 18, due Mar 3)
  • Homework 3 solution
  • Midterm

  • Mock midterm exam (posted Feb 3)
  • Presentation

  • choose a paper among the Proceedings of RECOMB 2014 or ISMB 2014 and send the title to me and the slot number (1-13) when you want to present, see below
  • send the Powerpoint file to me the day before the presentation (before 5pm)
  • give the 16 minutes presentation (make sure you time it correctly, I will stop you at 16 mins)
  • Calendar of Lectures

  • Jan 6: Intro, Molecular Biology (1-21)
  • Jan 8: Molecular Biology (22-43)
  • Jan 13: Molecular Biology (44-65)
  • Jan 15: Molecular Biology (66-86) [hw1 posted]
  • Jan 20: Molecular Biology (87-end), Molecular Biology Tools
  • Jan 22: Molecular Biology Tools
  • Jan 27: Molecular Biology Tools
  • Jan 29: Indexing/Searching (1-28) [hw1 due][hw2 posted]
  • Feb  3: Indexing/Searching (29-)
  • Feb  5: Indexing/Searching (-)
  • Feb 10: Guest Lecture
  • Feb 12: Indexing/Searching (-end), Probability Models (1-15)
  • Feb 17: [hw2 due] [hw3 posted]
  • Feb 19: Probability Models (16-)
  • Feb 24: Probability Models, Biological Networks
  • Feb 26: Biological Networks
    First presentation (deadline for the PPT file is Feb 25th, 5PM)
    1: Yang Liu (dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes, RECOMB)
  • Mar  3 MIDTERM (80 minutes, in class, closed books, closed notes) [hw3 due]
  • Mar  5: More Presentations (deadline for the PPT file is Mar 4th, 5PM)
    2: Suhas Sureshchandra (Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation, ISMB 2014)
    3: Xing (Vic) Zhang (Deep learning of the tissue-regulated splicing code, ISMB 2014)
    4: Abbas Roayaei (An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes)
    5: Chetas Manjunath (RNA-Skim: a rapid method for RNA-Seq quantification at transcript level, ISMB 2014)
  • Mar 10: More Presentations (deadline for the PPT file is Mar 9th, 5PM)
    6: Md. Abid Hasan (CSAX: Characterizing Systematic Anomalies in eXpression Data, RECOMB 2014)
    7: Weihua Pan (PASTA: Ultra-Large Multiple Sequence Alignment, RECOMB 2014)
    8: Leo Phong Vu (Exact Learning of RNA Energy Parameters from Structure, RECOMB 2014)
    9: Sawyer Masonjones (Cross-study validation for the assessment of prediction algorithms, ISMB 2014)
  • Mar 12: More Presentations (deadline for the PPT file is Mar 11th, 5PM)
    10: Kazi Islam (Robust clinical outcome prediction based on Bayesian analysis of transcriptional profiles and prior causal networks, ISMB'14)
    11: Albert Do (Learning Protein-DNA Interaction Landscapes by Integrating Experimental Data through Computational Models, RECOMB 2014)
    12: Sepideh Azarnoosh (Functional association networks as priors for gene regulatory network inference, ISMB 2014)
    13: Ashraful Arefeen (Large scale analysis of signal reachability, ISMB'14)
  • Project Demo (20-25 minutes demo, 5-10 minutes questions, in my office, please bring your laptop)

  • Monday, March 16th
              10:00 NAME
              10:30 Leo
              11:00 Xing (Vic) Zhang
              11:30 Suhas Sureshchandra
  • Tuesday, March 17th
              10:00 Yang Liu
              10:30 Weihua
              11:00 Md. Abid Hasan
              11:30 Sawyer Masonjones
  • Wednesday, March 18th
              9:30 Kazi Islam
              10:00 Ashraful Arefeen
              10:30 Albert Do
              11:00 Chetas Manjunath
              11:30 Abbas Roayaei