The evolution of genomes
The vast majority of Eukaryotic genomes consists of junk DNA, which has function. Yet, the nucleotides of this junk DNA exhibit a non-random organization and the question is why. A deeper analysis of the architecture of genomes shows that they possess a complex architecture, in which the GC nucleotides are organized into homogeneous and nonhomogeneous domains of varying lengths and GC composition. By large, many genic features, like richness of genes and regulatory elements, are correlated with GC richness, yes these structures stretch beyond the coding regions and raise many evolutionary questions, such as, why do they exist? Why did they not disappear throughout evolution? What maintains these domains? What benefits to they provide genes and regulatory elements? Can they be used to address open phylogenetic and evolutionary questions.
The first question that we must ask ourselves is how to find these regions. Numerous methods for segmenting DNA sequences into contiguous compositionally-coherent domains (some of which were termed “isochores”) have been proposed in the literature. These methods differ from one another in the number and types of parameters used in the segmentation process, as well as in the levels of user intervention. Unfortunately, even methods that limit user input to a few parameters yield incongruent results with one another. We developed IsoPlotter a robust and unsupervised segmentation methods and validated it through benchmark simulations. IsoPlotter has since been adopted by over a dozen genome sequencing projects to resolve the genomic compositions of various species, allowing us to characterize and carry out evolutionary comparative analyses while addressing evolutionary questions.
In the above illustration homogeneous domains are in blue shades; nonhomogeneous domains are in green shades. Domains longer than 300 kb are in dark shades; domains shorter than 300 kb are inlight shades. Compositionally homogeneous domains longer than 300 kb(i.e., isochoric domains) are in dark blue. From Elhaik and Graur (2014, at PLOS Computational Biology's website).
With the availability of IsoPlotter, we can now investigate the genome architecture in various organisms and develop methods to track the evolutionary changes in genomes. The composition and organization of the compositional domains were likely shaped by different evolutionary processes that either fused or broke down the domains. For example, in Simola et al. (2012) we showed that unlike other Hymenopterans, long domains have been rapidly accumulated along the ant linage with the leaf-cutter ants having the largest domains among all fully sequenced insect genomes. Yet, the evolutionary mechanisms that shaped the transitions that affected these genomes remain unclear. Understanding these biological mechanisms and their evolutionary implications is a key factor in reconstructing the evolutionary history of genome evolution.
- Elhaik E, Graur D, Josić K: Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 2010, 27(5):1015-1024.
- Elhaik E, Graur D, Josic K, Landan G: Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm. Nucleic Acids Res 2010, 38(15):e158.
- Elhaik E, Landan G, Graur D: Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition? Mol Biol Evol 2009, 26(8):1829-1833.
- Smith CR, Smith CD, Robertson HM, Helmkampf M, Zimin A, Yandell M, Holt C, Hu H, Abouheif E, Benton R et al: Draft genome of the red harvester ant Pogonomyrmex barbatus. Proc Natl Acad Sci USA 2011, 108(14):5667-5672.
- Smith CD, Zimin A, Holt C, Abouheif E, Benton R, Cash E, Croset V, Currie CR, Elhaik E, Elsik CG et al: Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proc Natl Acad Sci USA 2011, 108(14):5673-5678.
- Suen G, Teiling C, Li L, Holt C, Abouheif E, Bornberg-Bauer E, Bouffard P, Caldera EJ, Cash E, Cavanaugh A et al: The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet 2011, 7(2):e1002007.
- Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke R et al: Insights into social insects from the genome of the honeybee Apis mellifera. Nature 2006, 443(7114):931-949.
- Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E et al: Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proc Natl Acad Sci USA 2010, 107(27):12168-12173.
- Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, Weinstock GM, Adelson DL, Eichler EE, Elnitski L, Guigo R et al: The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 2009, 324(5926):522-528.
- Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, Colbourne JK, Beukeboom LW, Desplan C, Elsik CG, Grimmelikhuijzen CJ et al: Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 2010, 327(5963):343-348.
- Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD et al: The genome of the sea urchin Strongylocentrotus purpuratus. Science 2006, 314(5801):941-952.
- Richards S, Gibbs RA, Weinstock GM, Brown SJ, Denell R, Beeman RW, Gibbs R, Bucher G, Friedrich M, Grimmelikhuijzen CJ et al: The genome of the model beetle and pest Tribolium castaneum. Nature 2008, 452(7190):949-955.
- Elhaik E, Graur D: IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes. ISRN Bioinformatics 2013, 2013:6.