**CS 234, Winter 2009: Computational Methods for the Analysis of Biomolecular Data**

A staggering wealth of data has being generated by genome sequencing projects and other efforts to determine the structures and functions of biological systems. This advanced graduate course will focus on a selection of computational problems aimed at automatically analyze, cluster and classify biomolecular data.

**Class Meeting**

12:40 p.m. - 02:00 p.m. Engineering II room 141

**Office hours**

By appointment, please email me

**Preliminary list of topics**

overview on probability and statistics intro to molecular and computational biology analysis of 1D sequence data (DNA, RNA, proteins) combinatorial algorithms and statistical methods for pattern discovery and sequence alignment sequence alignment and hidden Markov models (HMM) analysis of 2D data (gene expression data and graphs) clustering algorithms classification algorithms subspace clustering/bi-clustering genetic networks, co-expression networks, metabolic networks, protein-protein interaction graphs

**Prerequisites**

CS141 (Algorithms) or CS218 (Design and Analysis of Algorithms) or equivalent knowledge. Some programming experience is expected. Students should have some notions of probability and statistics. No biology background is assumed.

**Course Format**

The course will include lectures by the instructor, guest lectures, and possibly discussion sessions on special problems. Students are expected to study the material covered in class. In addition to selected chapters from some of the books listed below, there may be handouts of research papers. There will be three/four assignments, mostly of theoretical nature -- although some may require programming. The actual format of the course will ultimately depend on the number and the background of the students enrolled.

**Relation to Other Courses**

This course is intended to complement "CS238: Algorithms in Computational Molecular Biology", and "CS235: Data Mining Concepts".

**References** (books)

**References** (papers)

**Slides**

Slides [PDF Format 2slides/page] (Course Overview) Slides [PDF Format 2slides/page] (Intro to Mol Biology) Slides [PDF Format 2slides/page] (Some basic probability) Slides [PDF Format 2slides/page] (Intro to Pattern Discovery) Slides [PDF Format 2slides/page] (Discovery of Rigid Patterns) Slides [PDF Format 2slides/page] (HMM) Slides [PDF Format 2slides/page] (Microarrays) Slides [PDF Format 2slides/page] (Biological networks)

**Resources**

The inner life of a Cell DNA Molecular animation A bioinformatics glossary What's a Genome (on-line book) DNA interactive Primer on Molecular Genetics PMP Resources

**Projects**

**Homework**

Homework 1 (posted Jan 13, due Jan 27) Homework 2 (posted Jan 29, due Feb 12) Homework 3 (posted Feb 12, due Feb 26)

**Presentation**

choose a slot 1-10 below and send me your choice choose a paper among RECOMB 2008 or ISMB 2008 proceedings and send the title to me send the Powerpoint file to me the day before the presentation (before 5pm) give the 15 minutes presentation (make sure you time it correctly, I will stop you after 15mins)

**Calendar of Lectures**

Jan 6: Intro, Molecular Biology Jan 8: Molecular Biology Jan 13: Molecular Biology [hw1 posted] Jan 15: Molecular Biology Jan 20: Statistics and Probability Jan 22: Pattern Discovery Jan 27: Pattern Discovery [hw1 due] Jan 29: Pattern Discovery [hw2 posted] Feb 3: Pattern Discovery Feb 5: Pattern Discovery Feb 10: HMM Feb 12: HMM [hw2 due, hw3 posted] Feb 17: MIDTERM (in class, closed books, closed notes) Feb 19: Microarrays Feb 24: Microarrays Feb 26: Networks [hw3 due] Mar 3: Networks Mar 5: Presentations.(deadline for the PPT file is Mar 4, 5PM)

**Project Demo** (in my office, please bring your laptop)

Wed, Mar 18

