Project Homepage

III-CXT: Collaborative Research: A High-Throughput Approach to the Assignment of Orthologous Genes Based on Genome Rearrangement



Introduction: Orthologous genes, or orthologs, are genes in different species that have evolved directly from a common ancestral gene. Genome-scale assignment of orthologs is a fundamental and challenging problem in computational biology, and has a wide range of applications in comparative genomics and functional genomics. This project continues the development of the parsimony approach for assigning orthologs between closely related genomes which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. The project addresses three key algorithmic problems including (i) signed reversal distance with duplicates, (ii) signed transposition distance with duplicates, and (iii) minimum common string partition. Efficient solutions to each of these problems are combined and incorporated into a software system for ortholog assignment, called MSOAR . The project encompasses genome-wide analysis of orthologous (and paralogous) relationships on the human and mouse genomes to valdiate the approach, and more importantly, to address several important evolutionary biological questions including the characterization of gains and losses of duplicated genes in the two genomes, the elucidation of gene movements in one genome with respect to the other genome, and the quantification of different mechanisms of gene duplication.

Principal Investigators:

Tao Jiang (PI)
Liqing Zhang (co-PI). (Please see the project site at Virginia Tech.)

Graduate Students and Alumni at UCR:

Guanqun (Wilson) Shi
Jianxing Feng
Zheng Fu (alumnus)

Publications with the Grant as the Primary Source of Support:

Z. Fu and T. Jiang. Clustering of main orthologs for multiple genomes. Journal of Bioinformatics and Computational Biology (JBCB) 6(3): 573-84, 2008; An extended abstract of the paper has also appeared in Proc. 6th LSS Computational Systems Bioinformatics Conference (CSB), San Diego, 2007, pp. 195-202.

X. Chen, L. Guo, Z. Fan, and T. Jiang. W-AlignACE: An improved Gibbs sampling algorithm based on more accurate position weight matrices. Bioinformatics 24(9), pp. 1121-1128, 2008.

Z. Fu, X. Chen, V. Vacic, P. Nan, Y. Zhong, and T. Jiang. MSOAR: A high-throughput ortholog assignment system based on genome rearrangement. Journal of Computational Biology 14(9), pp. 1160-1175, 2007.

Z. Fu and T. Jiang. Computing the breakpoint distance between partially ordered genomes. Journal of Bioinformatics and Computational Biology (JBCB) 5(5), pp. 1087-1101, 2007.

J. Feng, R. Jiang and T. Jiang. A Max-Flow Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data (Extended Abstract). Proc. 7th LSS Computational Systems Bioinformatics Conference (CSB), Stanford, 2008, pp. 51-62.

J. Xiao, L. Wang, X. Liu, and T. Jiang. Finding additive biclusters with random background. Proc. 19th Symposium on Combinatorial Pattern Matching (CPM), Pisa, Italy, June 18-20, 2008, pp. 263-276; Full paper appears in Journal of Computational Biology (JCB) 15(10):1275-93, 2008.

X. Chen, L. Liu, Z. Liu and T. Jiang. On the minimum common integer partition problem. ACM Transactions on Algorithms (TALG) 5(1), article 12, 2008.

D. Song, Y. Yang, B. Yu, B. Zheng, Z. Deng, B. Lu, X. Chen, and T. Jiang. Computational prediction of novel non-coding RNAs in Arabidopsis thaliana. BMC Bioinformatics 10(Suppl 1):S36, 2009 (special issue for selected papers presented at the 7th Asia-Pacific Bioinformatics Conference (APBC), 2009, Beijing).

G. Shi, L. Zhang and T. Jiang. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. BMC Bioinformatics 2010, 11:10; Also presented at the 8th LSS Computational Systems Bioinformatics Conference (CSB), Stanford, 2009.

Y. Yang, J. Zhao, R. Morgan, W. Ma, and T. Jiang. Computational prediction of type III secreted proteins from gram-negative bacteria. BMC Bioinformatics 11(Suppl 1):S47, 2010 (special issue for selected papers presented at the 8th Asia Pacific Bioinformatics Conference (APBC), Bangalore, India, 2010).

T. Jiang. Some algorithmic challenges in genome-wide ortholog assignment. Journal of Computer Science and Technology 25(1):42-52, 2010.

J. Feng, R. Jiang and T. Jiang. A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 8(3), pp. 621-634, 2011.

J. Feng, W. Li and T. Jiang. Inference of isoforms from short sequence reads (extended abstract). Proc. 14th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Lisbon, Portugal, April, 2010; full version appears in Journal of Computational Biology 18(3), pp. 305-321, 2011.

G. Shi, M.C. Peng and T. Jiang. MultiMSOAR 2.0: An accurate tool to identify ortholog groups among multiple genomes; preliminary version presented at the 9th LSS Computational Systems Bioinformatics Conference (CSB), Stanford, CA, 2010.

W. Li, J. Feng and T. Jiang. IsoLasso: A LASSO regression approach to RNA-Seq based transcriptome assembly (extended abstract). Proc. 15th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Vancouver, BC, Canada, 2011, pp. 168-188; full version appears in Journal of Computational Biology 18(11), pp. 1693-1707, 2011.

O. Tanaseichuk, J. Borneman and T. Jiang. Separating metagenomic short reads into genomes via clustering (extended abstract). Proc. 11th Workshop on Algorithms in Bioinformatics (WABI), Saarbruken, Germany, Sept. 5-7, 2011, pp. 298-313.

Funding Sources:

This project is funded by an NSF grant IIS-0711129 for the period of Sept. 15, 2007 - August 31, 2011. A collaborative grant was simultaneously awarded to the co-PI Prof. Liqing Zhang for the same period.


The page was last updated on Sept. 1, 2011.