Name: Waleed Amjad
Contact Information: firstname.lastname@example.org
Course: CS 234 (Professor Stefano Lonardi)
Research Interests: Databases, Information Retrieval, Texting/Data Mining and Machine Learning
Language Selected for Implementation:
JAVA and MATLAB
Progress (as of March 10 2016)
Finished the implementation of shuffling or mixing of n reads to pass on to clustering component
Completed the implementation of K-means clustering and Naive Bayes classifier
Experiments are in progress.
Progress (as of February 25 2016)
Finished implementing generation of n reads, of length l, with sequencing error a 1% rate. Currently, implementing shuffling or mixing of n reads to pass on to clustering component.
Decided to use K-means clustering.
Collecting experimental data from GenBank to be used in the project.
Progress (as of February 11 2016)
Selected first approach described below (in the updated on January 27 2016)
Started implementing generation of n reads, of length l, with sequencing error a 1% rate.
Investigating different clustering algorithm for high dimensional data including K-means.
Also looking at dimensionality reduction using SVD
Progress (as of January 27 2016)
Reading and evaluating approaches including
Suggestion provided as part of project description: To use the distribution of k-mers in each read, typically 4-mers. Represent the count of occurrences of each of the 64 possible 4-mers in the read as a 64-dimentional vector, then use a clustering algorithm on these vectors to decide where to assign the reads (e.g., k-means where k=m).
Machine learning for metagenomics: methods and tools (2015)
MBBC: an efficient approach for metagenomic binning based on clustering (2015)