CS 234: Computational Methods for the Analysis of Biomolecular Data

Spring 2024

Overview

An unprecedented wealth of data is being generated by large genome/metagenome/epigenetic projects and other efforts to determine the structure and function of molecular biological systems. This advanced graduate class will focus on a selection of algorithms and data structures aimed at the analysis of biomolecular data.

Catalog Description

  • A study of computational and statistical methods aimed at automatically analyzing, clustering, and classifying biomolecular data. Includes combinatorial algorithms for pattern discovery; hidden Markov models for sequence analysis; analysis of expression data; and prediction of the three-dimensional structure of RNA and proteins.
  • Note: Credit is awarded for one of the following CS 144, CS 234, or CS 238.
  • Prerequisites

  • CS 111 (or equivalent)
  • CS 141 or CS 218 (or equivalent)
  • STAT 155 or STAT 160A (or equivalent)
  • Graduate standing
  • Note: Some programming experience is expected
  • Note: no biology background is assumed
  • Instructors

  • Stefano Lonardi, email, office MRB 3130
  • Saleh Sereshki, email, office MRB 3rd floor (cubicles)
  • Class Meeting

  • MW 12:30pm-1:50pm, Student Success Center, Room 125
  • Office hours

  • Stefano: TBA (or by appointment), Zoom meeting: TBA
  • Saleh: TBA (or by appointment), Zoom meeting: TBA
  • Preliminary list of topics

  • Intro to molecular and computational biology, including biotech tools
  • String matching and approx string matching (Z-algorithm, KMP, Boyer Moore)
  • Space-efficient data structures for sequences (suffix tries/trees, suffix arrays, B-W transform)
  • Hidden Markov models, Profile HMM, Viterbi and Baum-Welch learning
  • Motif finding and Gibbs sampling
  • References

  • (HMMs) Richard Durbin, A. Krogh, G. Mitchison, and S. Eddy, Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
  • (Suffix Trees) Dan Gusfield, Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology, Cambridge University Press, 1997.
  • (Algorithms) Dan E. Krane, Michael L. Raymer, Fundamental Concepts of Bioinformatics, Benjamin Cummings 2002
  • (Algorithms) Neil C. Jones and Pavel Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004
  • (Algorithms) Marketa Zvelebil, Jeremy O. Baum, Understanding Bioinformatics, Garland Science, 2007
  • Course Format

    The course will include lectures by the instructor and possibly guest lectures from senior PhD students. Students are expected to study the material covered in class. In addition to selected chapters from some of the books listed below, there may be handouts of research papers. There will be several homework assignments, mostly of theoretical nature -- although some may require a bit of programming. There will be a midterm and a final.

    Cheating

    We will not tolerate any kind of cheating in this course. Homework are to be completed on your own. The only external sources allowed are those mentioned above or by the instructor throughout the course. If you have a doubt or question, please just ASK. As per standard UCR policy, you may not submit answers (written or programming) to problem sets that contain material you did not produce yourself for the express purpose of this offering of this course. If we find that you have submitted work that is not your own or is work you submitted in a different course, I will assign you a zero on that assignment (and possibly a zero on the entire course, depending on the severity), and I will forward the case to Student Conduct and Academic Integrity Programs for campus-level consideration.

    Slides

    Slides will be posted on Canvas.

    Grades

    Grades will be posted on Canvas.

    Homework

    Homework will be released on Canvas. Homework will have to be submitted on Canvas. Each student is granted three "late days" which can be used (in integer units) on any of the homework. If a more dire situation arises, please contact the instructor.

    Calendar

    Week 1
  • Monday, Apr 1: Intro, Molecular Biology
  • Wednesday, Apr 3: Molecular Biology
  • Week 2
  • Monday, Apr 8:
  • Wednesday, Apr 10:
  • Week 3
  • Monday, Apr 15:
  • Wednesday, Apr 17:
  • Week 4
  • Monday, Apr 22:
  • Wednesday, Apr 24:
  • Week 5
  • Monday, Apr 29:
  • Wednesday, May 1:
  • Week 6
  • Monday, May 6:
  • Wednesday, May 8:
  • Week 7
  • Monday, Apr 13:
  • Wednesday, Apr 15:
  • Week 8
  • Monday, May 20:
  • Wednesday, May 22:
  • Week 9
  • Monday, May 27:
  • Wednesday, May 29:
  • Week 10
  • Monday, Jun 3:
  • Wednesday, Jun 5:
  • Finals' Week
  • Final
  • Additional resources

  • Learn how to Fold it! A great game about protein folding that can help the scientific community
  • Genomic Data Science Specialization (Coursera)
  • Bioconductor for Genomic Data Science (Coursera)
  • Genome Sequencing (Bioinformatics II) (Coursera)
  • Introduction to Genomics (NHGRI)
  • Fundamentals of Biology (on-line course)
  • Pevzner's bioinformatics courses (Coursera)