Name: Waleed Amjad

Contact Information: wamja001@ucr.edu

Course: CS 234 (Professor Stefano Lonardi)

Research Interests: Databases, Information Retrieval,
Texting/Data Mining and Machine
Learning

Project Selected:

Metagenomics
binning

Language Selected
for Implementation:

JAVA
and MATLAB

Progress (as of March 10 2016)

Finished the implementation of shuffling or mixing of
n reads to pass on to clustering component

Completed the implementation of K-means clustering and
Naive Bayes classifier

Experiments are in progress.

Progress (as of February 25 2016)

Finished implementing generation of n reads, of length
l, with sequencing error a 1% rate. Currently, implementing shuffling or mixing
of n reads to pass on to clustering component.

Decided to use K-means clustering.

Collecting experimental data from GenBank to be used
in the project.

Progress (as of February 11 2016)

Selected first approach described below (in the
updated on January 27 2016)

Started implementing generation of n reads, of length
l, with sequencing error a 1% rate.

Investigating different clustering algorithm for high
dimensional data including K-means.

Also looking at dimensionality reduction using SVD

Progress (as of January 27 2016)

Reading and evaluating approaches including

*Suggestion provided as part of project description: To use the
distribution of k-mers in each read, typically
4-mers. Represent the count of occurrences of each of the 64 possible 4-mers in
the read as a 64-dimentional vector, then use a clustering algorithm on these
vectors to decide where to assign the reads (e.g., k-means where k=m).*

*Machine
learning for metagenomics: methods and tools (2015)*

*http://arxiv.org/pdf/1510.06621.pdf*

*MBBC: an efficient approach for metagenomic binning based
on clustering (2015)*

*http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0473-8
*