RefSeq

Refseq
Content
Descriptioncurated non-redundant sequence database of genomes.
Contact
Research centerNational Center for Biotechnology Information
Primary citationPruitt KD & al. (2005)[1]
Access
Websitehttps://www.ncbi.nlm.nih.gov/RefSeq

The Reference Sequence (RefSeq) database[1] is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products. RefSeq was introduced in 2000.[2][3] This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule (i.e. DNA, RNA or protein) for major organisms ranging from viruses to bacteria to eukaryotes.

For each model organism, RefSeq aims to provide separate and linked records for the genomic DNA, the gene transcripts, and the proteins arising from those transcripts. RefSeq is limited to major organisms for which sufficient data are available (121,461 distinct "named" organisms as of July 2022),[4] while GenBank includes sequences for any organism submitted (approximately 504,000 formally described species).[5]

RefSeq categories

RefSeq collection comprises different data types, with different origins, so it is necessary to establish standard categories and identifiers to store each data type. The most important categories are:

RefSeq accession categories and molecule types
Category Description
NC Complete genomic molecules
NG Incomplete genomic region
NM mRNA
NR ncRNA
NP Protein
XM predicted mRNA model
XR predicted ncRNA model
XP predicted Protein model (eukaryotic sequences)
WP predicted Protein model (prokaryotic sequences)

For more details and more categories, see Table 1 in Chapter 18 of the book The Reference Sequence (RefSeq) Database.

RefSeq Projects

Several projects to improve RefSeq services are currently in development by the NCBI, often in collaboration with research centers such as EMBL-EBI:

  • Consensus CDS (CCDS): This project aims to identify a core set of human and mouse protein-coding regions and standardize sets of genes with high and consistent levels of genomic annotation quality. This project was announced in 2009 and is still in development.[6][7]
  • RefSeq Functional Elements (RefSeqFE): It is focused on describing non-genic functional elements which are gene regulatory regions such as: enhancers, silencers, DNase I hypersensitive regions, DNA replication origins etc.). The current scope of this project is restricted to the human and mouse genomes.[8]
  • RefSeqGene: Its main goal is to define genomic sequences to be used as reference standards for well-characterized genes. Previously described mRNA, protein and chromosome sequences have the weaknesses of not providing explicit genomic coordinates of gene flanking and intronic regions as well as showing awkwardly large coordinates that change with every new genome assembly. The RefSeqGene project is designed to eliminate these errors.[9]
  • Targeted Loci: This project records molecular markers, specially protein-coding and ribosomal RNA loci that are used for phylogenetic and barcoding analysis. The scope of this project includes sequences for Archaea, Bacteria and Fungi organisms, accessible via Entrez and BLAST queries. It also includes GenBank sequences for Animals, Plants and Protists, accessible via BLAST queries.[10]
  • Virus Variation (ViV): It is a specific resource of sequence data processing pipelines and analysis tools for display and retrieval of sequences from several viral groups such as influenza virus, ebolavirus, MERS coronavirus or Zika virus. New viruses, processing pipelines, tools and other features are included regularly.[11]
  • RefSeq Select: This project aims to select datasets of RefSeq Select transcripts, as the most representative for every protein-coding gene, based on multiple criteria: prior use in clinical databases, transcript expression, evolutionary conservation of the coding region etc. Since many genes are represented by multiple RefSeq transcripts/proteins due to the biological process of alternative splicing, this complexity is problematic for studies such as comparative genomics or exchange of clinical variant data.[12]
  • MANE (Matched Annotation from the NCBI and EMBL-EBI): It is a collaborative project between NCBI and EMBL-EBI whose main goal is to define a set of transcripts and their proteins for all the protein-coding genes in the human genome. By doing that, the differences in transcripts annotation between RefSeq and Ensembl/GENCODE annotation systems are reduced. A MANE Select transcripts set are created as a useful universal standard for clinical reporting and comparative or evolutionary genomics. A second MANE Plus Clinical set are also created with additional transcripts to report all Pathogenic (P) or Likely Pathogenic (LP) clinical variants available in public resources.[13] This project was announced in 2018 and is expected to finish in 2022.

Statistics

According to the RefSeq release 213 (July 2022), the number of species represented in the database by counting distinct taxonomic IDs are as follows:[4]

Taxonomic ID Species
Archaea 1443
Bacteria 69122
Fungi 16869
Invertebrate 5715
Mitochondrion 13648
Plant 9177
Plasmid 6073
Plastid 9430
Protozoa 746
Vertebrate (mammalian) 1509
Viral 11620
Vertebrate (other) 5237
Other 4
Complete 121461

The counts of accession and basepairs per molecule type are:[4]

Molecule type Accessions Basepairs/residues
Genomics 40,758,769 2.923212393984×10^12
RNA 45,781,716 1.22253022047×10^11
Protein 234,520,053 9.129062394×10^10

See also

References

  1. ^ a b Pruitt KD, Tatusova T, Maglott DR (January 2005). "NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins". Nucleic Acids Research. 33 (Database issue): D501–D504. doi:10.1093/nar/gki025. PMC 539979. PMID 15608248.
  2. ^ Maglott DR, Katz KS, Sicotte H, Pruitt KD (January 2000). "NCBI's LocusLink and RefSeq". Nucleic Acids Research. 28 (1): 126–128. doi:10.1093/nar/28.1.126. PMC 102393. PMID 10592200.
  3. ^ Pruitt KD, Katz KS, Sicotte H, Maglott DR (January 2000). "Introducing RefSeq and LocusLink: curated human genome resources at the NCBI". Trends in Genetics. 16 (1): 44–47. doi:10.1016/s0168-9525(99)01882-x. PMID 10637631.
  4. ^ a b c RefSeq Release 213 Statistics (Report). National Library of Medicine. 11 July 2022. Retrieved 20 July 2022.
  5. ^ Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I (January 2022). "GenBank". Nucleic Acids Research. 50 (D1): D161–D164. doi:10.1093/nar/gkab1135. PMC 8690257. PMID 34850943.
  6. ^ Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, et al. (July 2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Research. 19 (7): 1316–1323. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102.
  7. ^ Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, et al. (January 2018). "Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation". Nucleic Acids Research. 46 (D1): D221–D228. doi:10.1093/nar/gkx1031. PMC 5753299. PMID 29126148.
  8. ^ Farrell CM, Goldfarb T, Rangwala SH, Astashyn A, Ermolaeva OD, Hem V, et al. (January 2022). "RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse". Genome Research. 32 (1): 175–188. doi:10.1101/gr.275819.121. PMC 8744684. PMID 34876495.
  9. ^ Gulley ML, Braziel RM, Halling KC, Hsi ED, Kant JA, Nikiforova MN, et al. (June 2007). "Clinical laboratory reports in molecular pathology". Archives of Pathology & Laboratory Medicine. 131 (6): 852–863. doi:10.5858/2007-131-852-CLRIMP. PMID 17550311.
  10. ^ "NCBI RefSeq Targeted Loci Project". www.ncbi.nlm.nih.gov. Retrieved 2022-07-27.
  11. ^ Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, et al. (January 2017). "Virus Variation Resource - improved response to emergent viral outbreaks". Nucleic Acids Research. 45 (D1): D482–D490. doi:10.1093/nar/gkw1065. PMC 5210549. PMID 27899678.
  12. ^ "NCBI RefSeq Select". www.ncbi.nlm.nih.gov. Retrieved 2022-07-27.
  13. ^ Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. (April 2022). "A joint NCBI and EMBL-EBI transcript set for clinical genomics and research". Nature. 604 (7905): 310–315. Bibcode:2022Natur.604..310M. doi:10.1038/s41586-022-04558-8. PMC 9007741. PMID 35388217.

Sources

Read other articles:

Медаль «За службу в Вооружённых силах»норв. Forsvarsmedaljen Страна  Норвегия Тип Медаль Статус Вручается Статистика Параметры диаметр 33 мм. Дата учреждения 1 мая 1982 года Очерёдность Старшая награда Серебряная Королевская юбилейная медаль Младшая награда Медаль «За служб...

У Вікіпедії є статті про інші географічні об’єкти з назвою Гансон. Переписна місцевість Гансонангл. Hanson Координати 42°04′03″ пн. ш. 70°51′02″ зх. д. / 42.067500000027777673° пн. ш. 70.850600000027782244° зх. д. / 42.067500000027777673; -70.850600000027782244Координати: 42°04′03″ пн. ...

Đối với Vua Bồ Đào Nha, xem João I của Bồ Đào Nha. João I của KongoAwenekongo của Lukeni Kanda João I Nzinga a NkuwuManikongo của Vương quốc Kongo1470–1509Nkuwu a Ntinu of KongoAfonso I của KongoThông tin chungSinh1470Mất1509Phối ngẫuNzinga a NlazaHậu duệAfonso I of KongoKilukeniThân phụNkuwu a Ntinu của KongoTôn giáoCông giáo Rômaprev. tôn giáo Kongo João I của Kongo (mất năm 1509), bí danh Nzinga a Nkuwu hoặc Nkuwu Nzinga, là ng

У статистиці, центральна тенденція (частіше міра центральної тенденції) — це центральне або типове значення для розподілу ймовірностей[1]. ЇЇ також можна назвати центром або місцем розподілу. У мовленні, міру центральної тенденції часто називають середнім значен...

Мечеть Імама 32°39′18″ пн. ш. 51°40′40″ сх. д. / 32.65525600002777651° пн. ш. 51.67795400002777484° сх. д. / 32.65525600002777651; 51.67795400002777484Координати: 32°39′18″ пн. ш. 51°40′40″ сх. д. / 32.65525600002777651° пн. ш. 51.67795400002777484° сх. д. / 32.65525600002777651; 51.6779540000277748...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada November 2022. Eleanor CainesThe Motion Picture Story Magazine, 1913Lahir1870 atau 1880Philadelphia, PennsylvaniaMeninggal3 Juni, 1913Philadelphia, PennsylvaniaPekerjaanAktrisSuami/istriJack Le FaintWilliam Robson Eleanor Caines (1870 atau 1880-1913) adalah seorang ...

Orang YorùbáÀwọn ọmọ YorùbáWole SoyinkaTiwa SavageD'banjHakeem OlajuwonFolasade AduFela KutiOlusegun ObasanjoAyọSamuel Ajayi CrowtherJarome IginlaE. A. AdeboyeOmotola Jalade EkeindeOlawale RotimiAdewale Akinnuoye-AgbajeFemi OkeThomas Boni YayiAyodele Daley ThompsonBerkas:John Dabiri (John D. & Catherine T. MacArthur Foundation).JPGJohn DabiriDonald Adeosun FaisonMosunmola Mo AbuduDavid OyelowoNasir Oludara JonesTeju ColeWaleWizkid Ayodeji BalogunRashidi YekiniLedisiObafemi Ma...

Dacian fortress of Șeica MicăShown within RomaniaLocationCetate, Șeica Mică, Sibiu, RomaniaCoordinates46°03′N 24°07′E / 46.05°N 24.12°E / 46.05; 24.12Site notesConditionRuined Monument istoricReference no.SB-I-s-B-11999 [1] It was a Dacian fortified town. References ^ National Register of Historic Monuments in Romania, Sibiu County (PDF). www.inmi.ro. Archived from the original (PDF) on November 19, 2020. Retrieved 18 October 2012. vteAncient D...

WTA-toernooi van Calgary Officiële naam Avon Futures of Calgary Stad, land Calgary, Canada Auspiciën WTA Prijzengeld US$ 25.000 Deelnemers 32 enkel, 32 kwal. / ? dubbel Ondergrond hardcourt, binnen Periode februari Jaargangen 1980 - 1980 Portaal    Tennis Het WTA-toernooi van Calgary was een eenmalig tennistoernooi voor vrouwen dat van 4 tot en met 10 februari 1980 plaatsvond in de Canadese stad Calgary. De officiële naam van het toernooi was Avon Futures of Calgar...

Édouard Thouvenel (1818-1866). Édouard Antoine de Thouvenel (11 November 1818, Verdun, Meuse – 18 October 1866) was ambassador to the Ottoman Empire from 1855 to 1860, and French Minister of Foreign Affairs from 1860 to 1862.[1][2] Career After studying law and travelling throughout Europe, in 1840 Thouvenel published an account of his travels which first appeared in the Revue des Deux Mondes (la Hongrie et la Valachie. Souvenirs de voyages et notices historiques).[3&#...

British-born American fencer This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: ...

All That JazzPoster rilis teatrikalSutradara Bob Fosse Produser Robert Alan Aurthur Daniel Melnick Wolfgang Glattes Kenneth Utt Ditulis oleh Robert Alan Aurthur Bob Fosse PemeranRoy ScheiderJessica LangeLeland PalmerAnn ReinkingPenata musikRalph BurnsSinematograferGiuseppe RotunnoPenyuntingAlan HeimDistributor20th Century Fox (Amerika Utara)Columbia Pictures (Internasional)Tanggal rilis 20 Desember 1979 (1979-12-20) Durasi123 menit[1]Negara Amerika Serikat Bahasa Inggris An...

French jurist and Nobel laureate This article is about the person René Cassin. For the human-rights group named after him, see CCJO René Cassin. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: René Cassin – news · newspapers · books · scholar · JSTOR (February 2021) (Learn how and when to remove this temp...

Football league seasonCampeonato TocantinenseSeason2023ChampionsTocantinópolisRelegatedInterportoPalmas (withdrew)Série DTocantinópolisCapitalCopa do BrasilTocantinópolisCapitalCopa VerdeTocantinópolisCapitalMatches played27Goals scored58 (2.15 per match)Biggest home winTocantinópolis 6–0 Tocantins de Miracema(12 February 2023)Biggest away winTocantins de Miracema 0–4 Tocantinópolis(20 March 2023)Highest scoringTocantinópolis 6–0 Tocantins de Miracema(12 February 2023)← 2...

Map showing the former Byumba Province in Rwanda Byumba Province was one of the 12 former provinces (intara) of Rwanda and was situated in the north of the country, sharing a border with Uganda. It had an area of some 1,796 km2 (693 square miles) and its population was estimated at 782,427 (2002 figures) prior to its dissolution in January 2006. External links Rwandagateway.org Authority control databases International FAST VIAF National United States 1°35′11″S 30°03′46″E࿯...

Ведмідьрос. МедведьЖанр кінокомедія і водевільРежисер Анненський Ісидор МарковичСценарист Анненський Ісидор МарковичНа основі ВедмідьУ головних ролях Жаров Михайло Іванович, Андровська Ольга Миколаївна, Пельтцер Іван Романович і Сорокін Костянтин Мико...

Shopping mall in Missouri, U.S.Jamestown MallJamestown Mall in 2012LocationFlorissant, Missouri, U.S.Coordinates38°49′12″N 90°14′50″W / 38.81992°N 90.24726°W / 38.81992; -90.24726Address175 Jamestown MallOpening date1973; 50 years ago (1973)Closing dateJuly 1, 2014; 9 years ago (2014-07-01) (demolition began on September 26, 2023)DeveloperRichard E. Jacobs GroupOwnerSt. Louis County Port AuthorityNo. of stores and service...

William Ewart GladstonePerdana Menteri Britania RayaMasa jabatan15 Agustus 1892 – 2 Maret 1894Penguasa monarkiRatu VictoriaPendahuluThe Marquess of SalisburyPenggantiThe Earl of RoseberyMasa jabatan1 Februari – 20 Juli 1886PendahuluThe Marquess of SalisburyPenggantiThe Marquess of SalisburyMasa jabatan23 April 1880 – 9 Juni 1885PendahuluThe Earl of BeaconsfieldPenggantiThe Marquess of SalisburyMasa jabatan3 Desember 1868 – 17 Februari 1874PendahuluT...

American college basketball season 2014–15 North Florida Ospreys men's basketballAtlantic Sun regular seasonand tournament championsCancún Challenge Mayan Division championsNCAA tournament, first roundConferenceAtlantic Sun ConferenceRecord23–12 (12–2 A-Sun)Head coachMatthew Driscoll (6th season)Assistant coachBobby KennenStephen PerkinsByron TaylorHome arenaUNF ArenaSeasons← 2013–142015–16 → 2014–15 Atlantic Sun men's basketball standings vte Conf...

Chemical compound EnadolineClinical dataATC codenoneIdentifiers IUPAC name 2-(1-Benzofuran-4-yl)-N-methyl-N-[(5R,7S,8S)-7-pyrrolidin-1-yl-1-oxaspiro[4.5]decan-8-yl]acetamide CAS Number124378-77-4 YPubChem CID60768IUPHAR/BPS1646ChemSpider54765 YUNIIKJL283326CChEMBLChEMBL318859 YCompTox Dashboard (EPA)DTXSID4047258 Chemical and physical dataFormulaC24H32N2O3Molar mass396.531 g·mol−13D model (JSmol)Interactive image SMILES O=C(Cc1cccc2occc12)N(C)C1CCC2(CCCO2)CC1N1CCCC1 InC...