A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries.[1] There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.[2]
Properties of reference genomes
Measures of length
The length of a genome can be measured in multiple different ways.
A simple way to measure genome length is to count the number of base pairs in the assembly.[3]
The golden path is an alternative measure of length that omits redundant regions such as haplotypes and pseudo autosomal regions.[4][5] It is usually constructed by layering sequencing information over a physical map to combine scaffold information. It is a 'best estimate' of what the genome will look like and typically includes gaps, making it longer than the typical base pair assembly.[6]
Contigs and scaffolds
Reference genomes assembly requires reads overlapping, creating contigs, which are contiguous DNA regions of consensus sequences.[7] If there are gaps between contigs, these can be filled by scaffolding, either by contigs amplification with PCR and sequencing or by Bacterial Artificial Chromosome (BAC) cloning.[8][7] Filling these gaps is not always possible, in this case multiple scaffolds are created in a reference assembly.[9] Scaffolds are classified in 3 types: 1) Placed, whose chromosome, genomic coordinates and orientations are known; 2) Unlocalised, when only the chromosome is known but not the coordinates or orientation; 3) Unplaced, whose chromosome is not known.[10]
The number of contigs and scaffolds, as well as their average lengths are relevant parameters, among many others, for a reference genome assembly quality assessment since they provide information about the continuity of the final mapping from the original genome. The smaller the number of scaffolds per chromosome, until a single scaffold occupies an entire chromosome, the greater the continuity of the genome assembly.[11][12][13] Other related parameters are N50 and L50. N50 is the length of the contigs/scaffolds in which the 50% of the assembly is found in fragments of this length or greater, while L50 is the number of contigs/scaffolds whose length is N50. The higher the value of N50, the lower the value of L50, and vice versa, indicating high continuity in the assembly.[14][15][16]
The original human reference genome was derived from thirteen anonymous volunteers from Buffalo, New York. Donors were recruited by advertisement in The Buffalo News, on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's genetic counselors and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people and one male, designated RP11, accounts for 66 percent of the total. The ABO blood group system differs among humans, but the human reference genome contains only an O allele, although the others are annotated.[17][18][19][20][21]
As the cost of DNA sequencing falls, and new full genome sequencing technologies emerge, more genome sequences continue to be generated. In several cases people such as James D. Watson had their genome assembled using massive parallel DNA sequencing.[22][23] Comparison between the reference (assembly NCBI36/hg18) and Watson's genome revealed 3.3 million single nucleotide polymorphism differences, while about 1.4 percent of his DNA could not be matched to the reference genome at all.[21][22] For regions where there is known to be large-scale variation, sets of alternate loci are assembled alongside the reference locus.
The latest human reference genome assembly, released by the Genome Reference Consortium, was GRCh38 in 2017.[25] Several patches were added to update it, the latest patch being GRCh38.p14, published on the 3rd of February 2022.[26][27] This build only has 349 gaps across the entire assembly, which implies a great improvement in comparison with the first version, which had roughly 150,000 gaps.[18] The gaps are mostly in areas such as telomeres, centromeres, and long repetitive sequences, with the biggest gap along the long arm of the Y chromosome, a region of ~30 Mb in length (~52% of the Y chromosome's length).[28] The number of genomic clone libraries contributing to the reference has increased steadily to >60 over the years, although individual RP11 still accounts for 70% of the reference genome.[1] Genomic analysis of this anonymous male suggests that he is of African-European ancestry.[1] According to the GRC website, their next assembly release for the human genome (version GRCh39) is currently "indefinitely postponed".[29]
In 2022, the Telomere-to-Telomere (T2T) Consortium,[30] an open, community-based effort, published the first completely assembled reference genome (version T2T-CHM13), without any gaps in the assembly. It did not contain a Y-chromosome until version 2.0.[31][32] This assembly allows for the examination of centromeric and pericentromeric sequence evolution. The consortium employed rigorous methods to assemble, clean, and validate complex repeat regions which are particularly difficult to sequence.[33] It used ultra-long–read (>100 kb) sequencing to accurately sequence segmental duplications.[34]
The T2T-CHM13 is sequenced from CHM13hTERT, a cell line from an essentially haploid hydatidiform mole. "CHM" stands for "Complete Hydatidiform Mole," and "13" is its line number. "hTERT" stands for "human Telomerase Reverse Transcriptase". The cell line has been transfected with the TERT gene, which is responsible for maintaining telomere length and thus contributes to the cell line's immortality.[35] A hydatidiform mole contains two copies of the same parental genome, and thus is essentially haploid. This eliminates allelic variation and allows better sequencing accuracy.[34]
For much of a genome, the reference provides a good approximation of the DNA of any single individual. But in regions with high allelic diversity, such as the major histocompatibility complex in humans and the major urinary proteins of mice, the reference genome may differ significantly from other individuals.[37][38][39] Due to the fact that the reference genome is a "single" distinct sequence, which gives its utility as an index or locator of genomic features, there are limitations in terms of how faithfully it represents the human genome and its variability. Most of the initial samples used for reference genome sequencing came from people of European ancestry. In 2010, it was found that, by de novo assembling genomes from African and Asian populations with the NCBI reference genome (version NCBI36), these genomes had ~5Mb sequences that did not align against any region of the reference genome.[40]
Following projects to the Human Genome Project seek to address a deeper and more diverse characerization of the human genetic variability, which the reference genome is not able to represent. The HapMap Project, active during the period 2002 -2010, with the purpose of creating a haplotypes map and their most common variations among different human populations. Up to 11 populations of different ancestry were studied, such as individuals of the Han ethnic group from China, Gujaratis from India, the Yoruba people from Nigeria or Japanese people, among others.[41][42][43][44] The 1000 Genomes Project, carried out between 2008 and 2015, with the aim of creating a database that includes more than 95% of the variations present in the human genome and whose results can be used in studies of association with diseases (GWAS) such as diabetes, cardiovascular or autoimmune diseases. A total of 26 ethnic groups were studied in this project, expanding the scope of the HapMap project to new ethnic groups such as the Mende people of Sierra Leone, the Vietnamese people or the Bengali people.[45][46][47][48] The Human Pangenome Project, which started its initial phase in 2019 with the creation of the Human Pangenome Reference Consortium, seeks to create the largest map of human genetic variability taking the results of previous studies as a starting point.[49][50]
Mouse reference genome
Recent mouse genome assemblies are as follows:[36]
Release name
Date of release
Equivalent UCSC version
GRCm39
June 2020
mm39
GRCm38
Dec 2011
mm10
NCBI Build 37
Jul 2007
mm9
NCBI Build 36
Feb 2006
mm8
NCBI Build 35
Aug 2005
mm7
NCBI Build 34
Mar 2005
mm6
Other genomes
Since the Human Genome Project was finished, multiple international projects have started, focused on assembling reference genomes for many organisms. Model organisms (e.g., zebrafish (Danio rerio), chicken (Gallus gallus), Escherichia coli etc.) are of special interest to the scientific community, as well as, for example, endangered species (e.g., Asian arowana (Scleropages formosus) or the American bison (Bison bison)). As of August 2022, the NCBI database supports 71 886 partially or completely sequenced and assembled genomes from different species, such as 676 mammals, 590 birds and 865 fishes. Also noteworthy are the numbers of 1796 insects genomes, 3747 fungi, 1025 plants, 33 724 bacteria, 26 004 virus and 2040 archaea.[51] A lot of these species have annotation data associated with their reference genomes that can be publicly accessed and visualized in genome browsers such as Ensembl and UCSC Genome Browser.[52][53]
^Hurst J, Beynon RJ, Roberts SC, Wyatt TD (October 2007). Urinary Lipocalins in Rodenta:is there a Generic Model?. Chemical Signals in Vertebrates 11. Springer New York. ISBN978-0-387-73944-1.
^Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, et al. (January 2010). "Building the sequence map of the human pan-genome". Nature Biotechnology. 28 (1): 57–63. doi:10.1038/nbt.1596. PMID19997067. S2CID205274447.
تيلابيري الإحداثيات 14°13′00″N 1°27′00″E / 14.216666666667°N 1.45°E / 14.216666666667; 1.45 تقسيم إداري البلد النيجر[1] التقسيم الأعلى النيجر العاصمة تيلابيري [لغات أخرى] خصائص جغرافية المساحة 89623.0 كيلومتر مربع عدد السكان عدد السكان 2722482 ...
Торрічелла-Таверне італ. Torricella-Taverne Герб Країна Швейцарія[1] Кантон Тічино Межує з: сусідні адмінодиниці Альто-Малькантоне, Бедано, Ламоне, Орільйо ? Номерний знак TI Офіційна мова італійська Населення - повне 3072 (31 грудня 2020) Площа - повна 5.25 км² Вис...
село Червоний Тік В'їзд з боку АпостоловогоВ'їзд з боку Апостолового Країна Україна Область Дніпропетровська область Район Криворізький Громада Грушівська сільська громада Облікова картка Червоний Тік Основні дані Засноване 1927 Населення 814 Площа 2,08 км² Густо...
Moon in the DayPoster promosiNama alternatifMoonrise During the Day[1]Hangul낮에 뜨는 달 Arti harfiahThe Moon that Rises in the DayAlih AksaraNaj-e tteuneun dal Genre Fantasi[2] Romansa[2] BerdasarkanThe Moon that Rises in the Dayoleh Heyum[3]PengembangKT Studio Genie (planning)[4]Ditulis olehKim Seung-won[5]SutradaraPyo Min-soo[6]PemeranKim Young-daePyo Ye-jinOn Joo-wanJung Woong-inNegara asalKorea SelatanBahasa asliKoreaProduksiRu...
Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Januari 2023. Waggonbau Bautzen adalah salah satu pabrik sarana perkeretaapian milik Bombardier Transportation yang terletak di Bautzen, Sachsen, Jerman. Di internal Bombardier, pabrik ini terutama memproduksi trem dan kereta metro atau kereta ringan. Sejarah Awal m...
Esta página cita fontes, mas que não cobrem todo o conteúdo. Ajude a inserir referências. Conteúdo não verificável pode ser removido.—Encontre fontes: ABW • CAPES • Google (N • L • A) (Outubro de 2020) Secretário de Estado e Comandante Geral da Polícia Militar do Rio de Janeiro Brasão da PMERJ No cargoCoronel PM Luiz Henrique Marinho Piresdesde 27 de agosto de 2021 Residência Quartel-General da Polícia Militar ...
الاتحاد الليبرالي العربي تاريخ التأسيس 2008 الموقع الرسمي الموقع الرسمي تعديل مصدري - تعديل جزء من سلسلة مقالات حولالليبرالية التاريخ تاريخ الفكر الليبرالي مساهمات في النظرية الليبرالية تاريخ الليبرالية الكلاسيكية الأفكار ليبرالية سياسية ليبرالية اقتصادية حري...
أعراض خارج السبيل الهرمي معلومات عامة الاختصاص علم الأعصاب من أنواع أمراض الجهاز العصبي المركزي، واعتلالات العقد القاعدية الأسباب الأسباب دواء تعديل مصدري - تعديل الأعراض خارج السبيل الهرمي هي الأعراض التي ترتبط نموذجيًا بالنظام خارج السبيل الهرمي للقشرة ا...
CANT 22 Role Flying boat airlinerType of aircraft Manufacturer CANT Designer Raffaele Conflenti First flight 1927 Primary user SISA Number built 10 The CANT 22 was a flying boat airliner built in Italy in the 1920s and operated by Società Italiana Servizi Aerei (SISA) on their Adriatic routes. It was a conventional biplane design with unstaggered wings braced by Warren trusses. The three engines were mounted in nacelles carried in the interplane gap. Accommodation for passengers was pro...
Orang Jibuti DjiboutiensBendera JibutiJumlah populasica. 921,000-935,000[a]BahasaSomali, Afar, Arab, PrancisAgamaIslam (96%; Sunni · Sufisme),[1]Kelompok etnik terkaitOrang Eritrea, orang Etiopia, orang Somali, orang Tanduk Afrika lainnya, dan populasi Afro-Asiatik (orang Kushitik) lainnya. Orang Jibuti (Prancis: Djiboutiens) adalah orang yang mendiami atau yang berasal dari Jibuti. Negara ini terdiri dari dua kelompok etnik utama, yaitu orang Somali dan orang Afar. Terd...
Siete Padres de los Enanos Personaje de El Silmarillion Emblema de Durin, el mayor de los enanos.Creado por J. R. R. TolkienInformación personalNacimiento Edades de los ÁrbolesResidencia Montañas de BeleriandCaracterísticas físicasRaza EnanoSexo Masculino[editar datos en Wikidata] Los siete padres de los enanos son unos personajes fantásticos creados por el escritor británico J. R. R. Tolkien para las historias de su legendarium. Historia ficticia Fueron creados por Aulë en ...
هذه المقالة بحاجة لصندوق معلومات. فضلًا ساعد في تحسين هذه المقالة بإضافة صندوق معلومات مخصص إليها. جزء من سلسلة مقالات عنالمسيحية العقائد الله الآب الابن الروح القدس المسيح حياته ولادته عظة الجبل الإرسالية الكبرى الموت والقيامة تجسد المسيح اللاهوت الأسرار المقدسة العقيد...
American rock band Moon TaxiMoon Taxi performing at Bonnaroo Music Festival in 2018Background informationOriginVestavia Hills, Alabama, United StatesGenresIndie rockalternative rockjam rockYears active2006–presentLabels12th SouthRCAMembersTrevor TerndrupSpencer ThomsonTommy PutnamWes BaileyTyler RitterWebsitewww.ridethemoontaxi.com Moon Taxi is an American indie-alternative rock band based in Nashville, Tennessee. The band was founded in 2006 by Trevor Terndrup (vocals, guitar), Tommy Putna...
Dalam nama Korean ini, nama keluarganya adalah Yang. Yang Da-ilNama asal양다일Lahir21 Februari 1992 (umur 31)South KoreaPekerjaanSingerKarier musikGenreBaladaInstrumenVokalTahun aktif2010–sekarangLabelBrand New MusicNama KoreaHangul양다일 Alih AksaraYang Da-ilMcCune–ReischauerYang Tail Yang Da-il (Hangul: 양다일, lahir 21 Februari 1992),[1] adalah penyanyi Korea Selatan. Ia merilis album mini pertamanya, Say, pada tanggal 27 April 2016.[2] Referensi ^ ...
SMA Swasta Pergis Yapki MarosInformasiJenisSwastaNomor Pokok Sekolah Nasional40314138Jumlah kelasKelas X sampai kelas XIIAlamatLokasiJl. DR Ratulangi No. 62 Kelurahan Turikale, Kecamatan Turikale, Kabupaten Maros, Sulawesi Selatan, IndonesiaMoto SMA Swasta Pergis Yapki Maros adalah sebuah sekolah swasta menengah tingkat atas yang terletak di Jl. DR Ratulangi No. 62 [1] Kelurahan Turikale, Kecamatan Turikale Kabupaten Maros, Sulawesi Selatan, Indonesia. Sama dengan SMA pada umumny...
Meteorite Goose Lake meteoriteGoose Lake meteorite on displayTypeIron meteoriteGroupIAB-sLLCountryUnited StatesCoordinates41°58′48″N 120°32′30″W / 41.98000°N 120.54167°W / 41.98000; -120.54167Observed fallNoFound date1938-10-13TKW1,170 kilograms (2,580 lb) Related media on Wikimedia Commons The Goose Lake meteorite is a meteorite that was found at Goose Lake in the United States by two hunters from Oakland, California on October 13, 1938. In 1939 it wa...
2016 United States Shadow Representative election in the District of Columbia ← 2014 November 8, 2016 2018 → Turnout65.3% 26.9 pp[1] Nominee Franklin Garcia Party Democratic Popular vote 252,992 Percentage 97.3% Results by ward: Garcia—>90% Representative before election Franklin Garcia Democratic Elected Representative Franklin Garcia Democratic Elections in the District of Columbia Federal government Presidential elections 1964 1968...
Hugh Marlowe Marlowe en All About Eve (1950)Información personalNombre de nacimiento Hugh Herbert HippleNacimiento 30 de enero de 1911Filadelfia, Pensilvania, Estados UnidosFallecimiento 2 de mayo de 1982 (71 años)Nueva York, Estados UnidosCausa de muerte Infarto agudo de miocardio Sepultura Cementerio Ferncliff Nacionalidad EstadounidenseFamiliaCónyuge Edith Atwater (matr. 1941; div. 1946)K. T. Stevens (matr. 1946; div. 1968)Rosemary Torri&...
Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!