CS234 Computational Methods for Biomolecular Data
Project
For my project I will implement the PROJECTION motif finder according to the method described by Jeremy Buhler and Martin Tompa in their RECOMB 2001 paper.
- Progress:
- 1/25/09: Have read papers. Need to look up more background on EM method, which is used to refine initial motif guesses in the PROJECTION algorithm.
- 2/2/09: Program can create parameters and hash l-mers into buckets using a randomly-generated mask. Can also generate problem instances by randomly generating sequences and a motif, and randomly perturbing the motif and inserting it at a random location in each sequence. Next: implement bucket refinement.
- 2/15/09: Implemented bucket refinement using EM alg and other heuristics described in the paper. The generated profiles seem reasonable. Next: testing/actual motif recovery.
- 2/23/09: Tests seem promising. Improved running time from several hours to about 15 minutes for (15,4) motif problem. Next: running program on many problem instances.
- 3/1/09: Program gives consistent and correct output on a variety of problem instances. Ready for demonstration.