Nature 434, 724-731 (7 April 2005) | doi:10.1038/nature03466; Received 25 October 2004; Accepted 11 February 2005
Generation and annotation of the DNA sequences of human chromosomes 2 and 4
LaDeana W. Hillier1, Tina A. Graves1, Robert S. Fulton1, Lucinda A. Fulton1, Kymberlie H. Pepin1, Patrick Minx1, Caryn Wagner-McPherson1, Dan Layman1, Kristine Wylie1, Mandeep Sekhon1, Michael C. Becker1, Ginger A. Fewell1, Kimberly D. Delehaunty1, Tracie L. Miner1, William E. Nash1, Colin Kremitzki1, Lachlan Oddy1, Hui Du1, Hui Sun1, Holland Bradshaw-Cordum1, Johar Ali1, Jason Carter1, Matt Cordes1, Anthony Harris1, Amber Isak1, Andrew van Brunt1, Christine Nguyen1, Feiyu Du1, Laura Courtney1, Joelle Kalicki1, Philip Ozersky1, Scott Abbott1, Jon Armstrong1, Edward A. Belter1, Lauren Caruso1, Maria Cedroni1, Marc Cotton1, Teresa Davidson1, Anu Desai1, Glendoria Elliott1, Thomas Erb1, Catrina Fronick1, Tony Gaige1, William Haakenson1, Krista Haglund1, Andrea Holmes1, Richard Harkins1, Kyung Kim1, Scott S. Kruchowski1, Cynthia Madsen Strong1, Neenu Grewal1, Ernest Goyea1, Shunfang Hou1, Andrew Levy1, Scott Martinka1, Kelly Mead1, Michael D. McLellan1, Rick Meyer1, Jennifer Randall-Maher1, Chad Tomlinson1, Sara Dauphin-Kohlberg1, Amy Kozlowicz-Reilly1, Neha Shah1, Sharhonda Swearengen-Shahid1, Jacqueline Snider1, Joseph T. Strong1, Johanna Thompson1, Martin Yoakum1, Shawn Leonard1, Charlene Pearman1, Lee Trani1, Maxim Radionenko1, Jason E. Waligorski1, Chunyan Wang1, Susan M. Rock1, Aye-Mon Tin-Wollam1, Rachel Maupin1, Phil Latreille1, Michael C. Wendl1, Shiaw-Pyng Yang1, Craig Pohl1, John W. Wallis1, John Spieth1, Tamberlyn A. Bieri1, Nicolas Berkowicz1, Joanne O. Nelson1, John Osborne1, Li Ding1, Rekha Meyer1, Aniko Sabo1, Yoram Shotland1, Prashant Sinha1, Patricia E. Wohldmann1, Lisa L. Cook1, Matthew T. Hickenbotham1, James Eldred1, Donald Williams1, Thomas A. Jones2, Xinwei She3, Francesca D. Ciccarelli4, Elisa Izaurralde4, James Taylor5, Jeremy Schmutz6, Richard M. Myers6, David R. Cox6,10, Xiaoqiu Huang7, John D. McPherson1,10, Elaine R. Mardis1, Sandra W. Clifton1, Wesley C. Warren1, Asif T. Chinwalla1, Sean R. Eddy2, Marco A. Marra1,10, Ivan Ovcharenko8, Terrence S. Furey9, Webb Miller5, Evan E. Eichler3, Peer Bork4, Mikita Suyama4, David Torrents4, Robert H. Waterston1,10 and Richard K. Wilson1
- Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- EMBL, Meyerhofstrasse 1, Heidelberg 69117, Germany
- Center for Comparative Genomics and Bioinformatics, Departments of Biology and Computer Science, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Stanford Human Genome Center, Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
- Department of Computer Science, Iowa State University, Ames, Iowa 50011-1040, USA
- EEBI Division and Genome Biology Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
- Present addresses: Perlegen Sciences Inc., 2021 Stierlin Court, Mountain View, California 94943, USA (D.R.C); Baylor College of Medicine, 1 Baylor Plaza, Human Genome Sequencing Center, N1519, Houston, Texas 77030, USA (J.D.M.); Genome Sciences Centre, British Columbia Cancer Agency, 600 West 10th Avenue, Room 3427, Vancouver, British Columbia V5Z 4E6, Canada (M.A.M.); Department of Genome Sciences, Box 357730, University of Washington, 1705 NE Pacific Street, Seattle, Washington 98195-7730, USA (R.H.W.).
Correspondence to: Richard K. Wilson1 Correspondence and requests for materials should be addressed to R.K.W. (Email: email@example.com).
All reported DNA sequences have been deposited in GenBank or EMBL. Accession numbers for the chromosome sequence analysed for this paper can be found in Supplementary Table 1. The updated chromosome 2 and 4 sequences can be accessed through GenBank accession numbers NC_000002 (chromosome 2) and NC_000004 (chromosome 4). Primate resequencing data can be accessed using GenBank accession numbers CZ179368CZ179565. The mRNA resequencing data can be accessed via GenBank/dbSNP identifiers ss35032449ss35032461, ss35033273ss35033317.
Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.
Less than 50 years after the human diploid number was established, the reference human genome sequence was announced1. Detailed accounts of the sequences of individual chromosomes are now providing great insights into genomic structure and evolution. Here we present our analysis of the sequence of human chromosomes 2 and 4. For chromosome 2, we analyse the region containing the ancestral chromosome fusion event2 and describe possible mechanisms for the inactivation of the vestigial centromere. For chromosome 4, we discover some regions with the lowest and highest (G + C) content in the human genome, as well as the putative largest 'gene deserts'. Analyses of highly variant regions found on these chromosomes have also allowed us to investigate their origins.
13-09-2007 om 13:49
geschreven door Tsjok45