|
CS 234, Fall 2006: Computational Methods for the Analysis of Biomolecular Data
A staggering wealth of data has being generated by genome
sequencing projects and other efforts to determine the
structures and functions of biological systems. This
advanced graduate course will focus on a selection of
computational problems aimed at automatically analyze,
cluster and classify biomolecular data.
Class Meeting
TR, 2:10-3:30pm, Engineering II room 139
Office hours
TF, 11:10am-12noon, Engineering II room 317
Preliminary list of topics
overview on probability and statistics
intro to molecular and computational biology
analysis of 1D sequence data (DNA, RNA, proteins)
combinatorial algorithms and statistical methods for pattern discovery and sequence alignment
sequence alignment and hidden Markov models (HMM)
analysis of 2D time series data (gene expression data)
clustering algorithms
classification algorithms
subspace clustering/bi-clustering
analysis of other sources of biological data [time permitting]
protein-protein interaction graphs
TBA
Prerequisites
CS141 (Algorithms) or CS218 (Design and Analysis of
Algorithms) or equivalent knowledge. Some programming
experience is expected. Students should have some notions
of probability and statistics. No biology background is
assumed.
Course Format
The course will include lectures by the instructor, guest
lectures, and possibly discussion sessions on special
problems. Students are expected to study the material
covered in class. In addition to selected chapters from
some of the books listed below, there may be handouts of
research papers. There will be three/four assignments,
mostly of theoretical nature -- although some may require
programming. The actual format of the course will
ultimately depend on the number and the background of the
students enrolled.
Relation to Other Courses This
course is intended to complement "CS238:
Algorithms in Computational Molecular Biology", and
"CS235: Data Mining Concepts".
References
Richard Durbin, A. Krogh, G. Mitchison, and S. Eddy,
Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
Dan Gusfield,
Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology, Cambridge University Press, 1997.
Pierre Baldi, Soren Brunak, Bioinformatics: the machine learning
approach, MIT press, 1998.
Joćo Setubal and Joćo Carlos Meidanis
Introduction to Computational Molecular Biology,
PWS Publishing Co., 1997.
Jason Wang, Bruce A. Shapiro, and Dennis Shasha,
Pattern Discovery in Biomolecular Data Tools, Techniques, and Applications, Oxford University Press, 1999.
David Mount, Bioinformatics: Sequence and Genome Analysis
Cold Spring Harbor Laboratory Press, 2002
Dan E. Krane, Michael L. Raymer, Fundamental Concepts of
Bioinformatics, Benjamin Cummings 2002
Warren J. Ewens, Gregory R. Grant, Statistical Methods in
Bioinformatics: An Introduction, Springer, 2001
An Introduction to Bioinformatics Algorithms, Neil C. Jones and Pavel Pevzner, the MIT Press, 2004.
Papers
Anders Krogh, "An introduction to hidden Markov models for biological sequences" [PDF format]
Brona Brejova, Chrysanne DiMarco, Tomas Vinar, Sandra Romero Hidalgo, Gina Holguin, Cheryl Patten. "Finding Patterns in Biological Sequences". Unpublished TR. University of Waterloo, 2000 [PDF format]
Alberto Apostolico, Mary Ellen Bock, Stefano Lonardi, Xuyan Xu, "Efficient Detection of Unusual Words", Journal of Computational Biology, vol.7, no.1/2, pp.71-94, 2000 [PDF format]
Gesine Reinert, Sophie Schbath, Michael S. Waterman, "Probabilistic and Statistical Properties of Words: An Overview", Journal of Computational Biology, vol.7, no.1/2, 2000 [PDF format]
Todd Mood, "The Expectation-Maximization Algorithm", IEEE Signal Processing Magazine, Nov 1996 [PDF Format]
Jeff A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Applications to Parameter Estimation for Gaussian Mixture and HMM", UC Berkley, TR-97-021 [PDF Format]
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, J. C. Wootton, "Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment", Science 262, 1993 [PDF Format]
Jun S. Liu, Andrew F. Neuwald, Charles E. Lawrence, "Bayesian Models for Multiple Local Sequence
Alignment and Gibbs Sampling Strategies", Journal of the American Statistical Association, 90(432), 1995 [PDF Format]
Analysis of microarray gene expression data.
W. Huber, A. v.Heydebreck, M. Vingron. In Martin Bishop et al.(editors), Handbook of Statistical Genetics, 2nd Edition. John Wiley & Sons, Ltd., Chichester, UK, 2003.[PDF format]
Slides
Slides [PDF Format 2slides/page] (Course Overview)
Slides [PDF Format 2slides/page] (Intro to Mol Biology)
Slides [PDF Format 2slides/page] (Some basic probability)
Slides [PDF Format 2slides/page] (Intro to Pattern Discovery)
Slides [PDF Format 2slides/page] (Discovery of Rigid Patterns)
Slides [PDF Format 2slides/page] (HMM)
Slides [PDF Format 2slides/page] (Microarrays)
Slides [PDF Format 2slides/page] (Biological networks)
Resources
The inner life of a Cell
DNA Molecular animation
A bioinformatics glossary
What's a Genome (on-line book)
DNA interactive
Primer on Molecular Genetics
Daily news about bioinformatics
PMP Resources
Projects
Project ideas and rules
Craig Boucher's CS234 webpage
Guanqun Shi's CS234 webpage
Anna Charisi's CS234 webpage
Jin Shieh's CS234 webpage
Daniel Jordan's CS234 webpage
Theodoros Lappas's CS234 webpage
Danhua Guo's CS234 webpage
Shashwati Kasetty's CS234 webpage
Elena Harris's CS234 webpage
Jose Medina's CS234 webpage
Vincent Peng's CS234 webpage
Wei-Bung (Bob) Wang's CS234 webpage
Homework
Homework 1 (posted Oct 5, due Oct 19)
Homework 2 (posted Oct 20, due Nov 2)
Homework 3 (posted Nov 3, due Nov 16)
Homework 4 (posted Nov 17, due Dec 5)
Presentation
choose a slot 1-13 below and send me your choice
choose a paper among RECOMB 2006 or ISMB 2006 proceedings and send the title to me
send the Powerpoint file to me the day before the presentation (before 5pm)
give the 15 minutes presentation (make sure you time it correctly, I will stop you after 15mins)
Calendar of Lectures
Sep 28: Intro, Molecular Biology [slides 1-26]
Oct 3: Molecular Biology [slides 27-46]
Oct 5: Molecular Biology [slides 47-80]
Oct 10: Molecular Biology [slides 81-106]
Oct 12: Molecular Biology [slides 107-end], Intro to Probability [slides 1-17]
Oct 17: Intro to Probability [slides 18-end], Intro to Pattern Discovery [slides 1-8]
Oct 19: Intro to Pattern Discovery [9-40](hw1 due)
Oct 24: Intro to Pattern Discovery [41-84]
Oct 26: Intro to Pattern Discovery [85-end], Discovery of Rigid Patterns [1-44]
Oct 31: Discovery of Rigid Patterns [45-78]
Nov 2: Discovery of Rigid Patterns [79-end], HMM [1-15] (hw2 due)
Nov 7: HMM [16-78, skipped 34-76]
Nov 9: MIDTERM (in class, closed books, closed notes)
Nov 14: HMM [79-end], Microarrays [1-20]
Nov 16: Microarrays [21-end] (hw3 due)
Nov 21: Networks [1-end]
Nov 23: Thanksgiving
Nov 28: Presentations. (deadline for the PPT file is Nov 27th, 5PM)
1: Craig (Assessing Significance of Connectivity..., RECOMB'06)
2: Elena (Clustering Near-Identical Sequences..., RECOMB'06)
3: Jin (Rapid knot detection and application to protein..., ISMB'06)
4: Shi (Efficient identification of DNA hybridization ..., ISMB'06)
Nov 30: Guest Lecture by C. Shelton
Dec 5: Presentations. (deadline for the PPT file is Dec 4th, 5PM)
5: Daniel (Apples to apples: improving the performance ..., ISMB'06)
6: Jose (PROXIMO -- a new docking algorithm to model ..., ISMB'06)
7: Shashwati (Protein classification using ontology .., ISMB'06)
8: Wei-Bung (Sorting by Weighted Reversal ..., RECOMB'06)
Dec 7: Presentations. (deadline for the PPT file is Dec 6th, 5PM)
9: Anna (Distance based algorithms for small biomolecule..., ISMB'06)
10: Danhua (Create and assess protein networks through..., ISMB'06)
11: Theodoros (Simple and Fast Inverse Alignment, RECOMB'06)
12: Meng-Chih (Finding the evidence for protein-protein ..., ISMB'06)
Project Demo (in my office, please bring your laptop)
Dec 11: 10:00 Bob
11:00 Daniel
11:30 Theodoros
Dec 13: 10:00 Elena
10:30 Jose
11:00 Guanqun
11:30 Anna
Dec 15: 9:30 Shashwati
10:00 Craig
10:30 Meng-Chih
11:00 Jin
11:30 Danhua
|