|
CS 260-001, Winter 2003: Pattern Discovery in Biosequences
A staggering wealth of data has being generated by genome
sequencing projects and other efforts to determine the
structures and functions of biological systems. This
seminar course will focus on a selection of computational
problems aimed at automatically analyze, cluster and
classify biomolecular data.
Class Meeting
TR 11:10am-12:30pm, SURGE 349
Office hours
WF 4:30pm-5:30pm, SURGE 320
Preliminary list of topics
Overview on probability and statistics (2 lectures),
introduction to molecular and computational biology (3
lectures), pattern discovery and machine learning (2
lectures), enumerative algorithms for pattern discovery
(4 lectures), hidden Markov models and other statistical
methods (e.g, Gibbs sampler, EM) for pattern discovery (3
lectures). There may be guest lectures and discussion
sessions on special problems. The actual selection of
topics may be guided by the interests of the
participants.
Prerequisites
CS141 (Algorithms) or CS218 (Design and Analysis of
Algorithms) or equivalent knowledge. Some programming
experience is expected. Students should have some notions
of probability and statistics. No biology background is
assumed.
Course Format
The course will include lectures by the instructor,
class discussions, and presentations by the students. The
actual format will depend on the class size and the
background of the students enrolled. Students are
expected to study the material covered in class. In
addition to selected chapters from some of the books
listed below, there may be handouts of research
papers. There will be two to three assignments, mostly of
theoretical nature -- although some may require
programming. At the end of the course, students are
required to give a presentation on a research topic
selected from a list provided by the instructor. Original
projects or proposals will be taken into
consideration.
Relation to Other Courses This
seminar course is intended to complement the seminar
course on "Algorithms in Computational Molecular Biology"
previously taught by Prof.T.Jiang, and "CS235:Data Mining
Concepts" usually thought by Prof.D.Gunopulos.
References
Richard Durbin, A. Krogh, G. Mitchison, and S. Eddy,
Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
Dan Gusfield,
Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology, Cambridge University Press, 1997.
Pavel A. Pevzner,
Computational Molecular Biology: An Algorithmic Approach,
MIT Press, 2000.
Joćo Setubal and Joćo Carlos Meidanis
Introduction to Computational Molecular Biology,
PWS Publishing Co., 1997.
Jason Wang, Bruce A. Shapiro, and Dennis Shasha,
Pattern Discovery in Biomolecular Data Tools, Techniques, and Applications, Oxford University Press, 1999.
Pierre Baldi, Soren Brunak, Bioinformatics: the machine learning approach, MIT press, 1998.
Download
List of topics for presentation [Postscript Format] [PDF Format]
"Primer on Molecular Genetics" [Link]
Anders Krogh, "An introduction to hidden Markov models for biological sequences" [PDF format]
Brona Brejova, Chrysanne DiMarco, Tomas Vinar, Sandra Romero Hidalgo, Gina Holguin, Cheryl Patten. "Finding Patterns in Biological Sequences". Unpublished TR. University of Waterloo, 2000 [PDF format]
Alberto Apostolico, Mary Ellen Bock, Stefano Lonardi, Xuyan Xu, "Efficient Detection of Unusual Words", Journal of Computational Biology, vol.7, no.1/2, pp.71-94, 2000 [PDF format]
Gesine Reinert, Sophie Schbath, Michael S. Waterman, "Probabilistic and Statistical Properties of Words: An Overview", Journal of Computational Biology, vol.7, no.1/2, 2000 [PDF format]
Todd Mood, "The Expectation-Maximization Algorithm", IEEE Signal Processing Magazine, Nov 1996 [PDF Format]
Jeff A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Applications to Parameter Estimattion for Gaussian Mixture and HMM", UC Berkley, TR-97-021 [PDF Format]
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, J. C. Wootton, "Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment", Science 262, 1993 [PDF Format]
Jun S. Liu, Andrew F. Neuwald, Charles E. Lawrence, "Bayesian Models for Multiple Local Sequence
Alignment and Gibbs Sampling Strategies", Journal of the American Statistical Association, 90(432), 1995 [PDF Format]
Slides (not available fow download)
Slides [PDF Format 3slides/page] (Course Overview)
Slides [PDF Format 2slides/page] (Course Overview)
Slides [PDF Format 3slides/page] (Intro to Mol Biology)
Slides [PDF Format 2slides/page] (Intro to Mol Biology)
Slides [PDF Format 3slides/page] (Intro to Probability)
Slides [PDF Format 2slides/page] (Intro to Probability)
Slides [PDF Format 3slides/page] (Intro to Pattern Discovery)
Slides [PDF Format 2slides/page] (Intro to Pattern Discovery)
Slides [PDF Format 3slides/page] (Discovering Deterministic Patterns)
Slides [PDF Format 2slides/page] (Discovering Deterministic Patterns)
Slides [PDF Format 3slides/page] (Discovering Rigid Patterns)
Slides [PDF Format 2slides/page] (Discovering Rigid Patterns)
Slides [PDF Format 3slides/page] (Statistical Approaches)
Slides [PDF Format 2slides/page] (Statistical Approaches)
Slides [PDF Format 3slides/page] (Discovering Profiles)
Slides [PDF Format 2slides/page] (Discovering Profiles)
Resources
PMP Resources
Homeworks (not available fow download)
Homework 1, due Feb 4th [Postscript Format] [PDF Format]
Homework 2, due Mar 6th [Postscript Format] [PDF Format]
Calendar
Jan 7: Course overview, Intro to Molecular Biology (1/3)
Jan 9: Intro to Molecular Biology (2/3)
Jan 14: Intro to Molecular Biology (3/3), Probability and Statistics (1/2)
Jan 16: Probability and Statistics (2/2)
Jan 21: Intro to Pattern Discovery and Machine Learning (1/2)
Jan 23: Intro to Pattern Discovery and Machine Learning (2/2), Discovering Deterministic Patterns (1/2)
Jan 28: Discovering Deterministic Patterns (2/2)
Jan 30: Discovering Rigid Patterns (1/2)
Feb 4 : Discovering Rigid Patterns (2/2)
Feb 6: Statistical Methods (1/2)
Feb 11: Statistical Methods (2/2)
Feb 13: Discovering Profiles (1/2)
Feb 18: Discovering Profiles (2/2)
Feb 20: Andres Figueroa (guest lecture)
Feb 25: student presentations
Chuhu Yang: Identification of Regulatory Binding Sites through Data Clustering
Qiaofeng Yang: Biclustering of Gene Expression Matrices
Feb 27: student presentations
Ashish Sharma: "An algorithm for finding signals of unknown length in DNA sequences"
Norton Kitagawa: "Clustering of protein structure"
Mar 4: student presentations
Terrance Hamilton: Orphan gene findind
Ya-Lee Tsai: Finding the genes in genomic DNA
Mar 6: student presentations
Hongwei Ji: title TBA
Zheng Fu: title TBA
Mar 11: Thomas Girke (guest lecture)
Mar 13: student presentations
Hong Liu: A New challenge for Compression Algorithms: Genetic Sequences
Haifeng Li: "Computational identification of promoters and first exons in the human genome"
|