Write a program in your favourite language that performs the following steps.
1. Compute the correlation coefficient for each pair of genes. To do that, compute the mean of each row of the matrix, and then the standard deviation, and then save the normalized data. Then, for each pair of rows, the average product of the corresponding elements is the correlation coefficient.
2. Build a graph (a boolean adjacency matrix of 2882 by 2882) of the genes, with an edge between genes with correlation coefficient greater than 0.85. (An alternative is to add an edge when the absolute value of the correlation coefficient is greater than 0.85.)
3. Compute the degree (number of incidental edges) for each vertex of the graph.
4. Find local peaks of degrees higher than 5. (Enumerate vertices with degrees higher than 5 and also higher than degrees of all neighbors.)
5. Each of these peaks and its neighbors can be considered as a cluster of co-regulated genes. For checking your results, one of the gene clusters has the peak YOL071W and contains 23 other genes, including YOL052C-A.
6. Generate a plot for the expression of each gene cluster. The file h3cluster.doc is a plot for the gene cluster with the peak YOL071W.
For checking your results, one of the gene clusters has the peak YOL071W and contains 23 other genes, including YOL052C-A.
What other analysis would you do on this network?
Courtesy of Yizong Cheng, University of Cincinnati.