Article
Author: Jain, Miten ; Taylor, Dylan J ; Guarracino, Andrea ; Shumate, Alaina ; Schatz, Michael C ; Murphy, Terence D ; Haggerty, Leanne ; Koren, Sergey ; Thibaud-Nissen, Françoise ; Grady, Patrick G S ; Garcia Giron, Carlos ; Rautiainen, Mikko ; Heinz, Jakob ; Shepelev, Valery A ; Storer, Jessica M ; Potapova, Tamara ; Gerton, Jennifer L ; Garrison, Erik ; Diekhans, Mark ; Watwood, Allison C ; Fungtammasan, Arkarachai ; Asri, Mobin ; Formenti, Giulio ; Harris, Robert ; Wenger, Aaron M ; Hourlier, Thibaut ; Li, Heng ; Hartley, Gabrielle A ; Tomaszkiewicz, Marta ; Vollger, Mitchell R ; Zarate, Samantha ; Paulin, Luis F ; Cechova, Monika ; Mikheenko, Alla ; Munson, Katherine M ; Markovic, Christopher ; Hunt, Sarah E ; McNulty, Brandy M ; Shafin, Kishwar ; Lucas, Julian K ; Walenz, Brian P ; Halabian, Reza ; Wilson, Melissa A ; O'Neill, Rachel J ; Zhu, Yiming ; Allen, Jamie ; Hansen, Nancy F ; Hubley, Robert M ; Ryabov, Fedor ; Chen, Nae-Chyun ; Miga, Karen H ; Taravella Oill, Angela M ; Sauria, Michael E G ; Mc Cartney, Ann M ; Eichler, Evan E ; Hoyt, Savannah J ; Medvedev, Paul ; Surapaneni, Likhitha ; Alexandrov, Ivan A ; Altemose, Nicolas ; Zook, Justin M ; Weissensteiner, Matthias H ; Salzberg, Steven L ; Makalowski, Wojciech ; McCoy, Rajiv C ; Bzikadze, Andrey V ; Hwang, Stephen ; Sedlazeck, Fritz J ; Porubsky, David ; Lewis, Alexandra P ; Timp, Winston ; Logsdon, Glennis A ; Chin, Chen-Shan ; Rhie, Arang ; Harvey, William T ; Martin, Fergal J ; Phillippy, Adam M ; Haukness, Marina ; Hook, Paul W ; Olson, Nathan D ; Makova, Kateryna D ; Nurk, Sergey ; Flicek, Paul ; McDaniel, Jennifer ; Olsen, Hugh E ; Kesharwani, Rupesh K ; Gershman, Ariel
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.