Computational biology

This project focus on developing new and biologically relevant algorithms in the the following areas: genome evolution, phylogeny, and identification of regulatory sequences. One approach to reveal the function of genes is to correlate phenotype evolution with genome evolution. Genome evolution also provides an opportunity to establish the correspondence between genes in different genomes (orthology analysis), which can be used to translate knowledge of gene function in model organism to the corresponding knowledge for humans. In a genome, the genes evolve through nucleotide substitutions. The evolution of the genome is also shaped by a multitude of other evolutionary events acting at different organizational levels. Larger genome segments are affected by processes such as duplication, lateral transfer (where a segment of an organisms genome is transfered to the genome of another organism), inversion, transposition, deletion and insertion. Being able to identify genes that have been laterally transfered and count the number of lateral transfer events is crucial for the resolution of the existence of a tree of life. Finally, the whole genome is influenced by speciation and hybridization of organism lineages (where a new species is created by the fusion of two organisms genomes). The complexity of genome evolution poses a serious challenge in developing mathematical models and algorithms. A classical problem in computational biology is that of inferring the evolutionary history of a set of species. The evolutionary history is represented by a phylogenetic tree. Due to duplications and lateral transfers gene trees (i.e. phylogenetic trees for gene families) and the corresponding species tree may disagree. We have studied the algorithmic problem: for a given a set of disagreeing gene trees, find the species tree that explains the disagreement using a minimum number of duplications. We have also given a mathematically rigid and biologically sound model for lateral transfers and a fast algorithm for the problem: given a gene tree and a species tree, find the minimum number of lateral transfers that explains the difference between the given trees. Recently, we have started developing algorithms for identification of regulatory sequences. There exist basically two algorithmic approaches two this problem. In the first, promoter regions of co-regulated genes are searched for similar substrings. In the second, promoter regions of a gene family are searched for similar substrings. The approaches yields different algorithmic questions, since in the latter case the notion of similarity can be defined relative to a species tree.

Up to Research, Theory group at Nada, KTH.

Responsible for this page: Jens Lagergren <jensl@nada.kth.se>
Latest change October 17, 2002
Technical support: <webmaster@nada.kth.se>