A gene whose sequence partially overlaps the reading frame of another gene
An overlapping gene (or OLG)[1][2] is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene.[3] In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viralgenomes.[2] The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses.[2] In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.
Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate reading frame from another gene at the same locus.[4] The alternative open reading frames (ORF) are thought to be created by critical nucleotide substitutions within an expressible pre-existing gene, which can be induced to express a novel protein while still preserving the function of the original gene.[5] Overprinting has been hypothesized as a mechanism for de novo emergence of new genes from existing sequences, either older genes or previously non-coding regions of the genome.[6] It is believed that most overlapping genes, or genes whose expressible nucleotide sequences partially overlap with each other, evolved in part due to this mechanism, suggesting that each overlap is composed of one ancestral gene and one novel gene.[7] Subsequently, overprinting is also believed to be a source of novel proteins, as de novo proteins coded by these novel genes usually lack remote homologs in databases.[8] Overprinted genes are particularly common features of the genomic organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.[9] It is likely that overprinting is responsible for the generation of numerous novel proteins by viruses over the course of their evolutionary history.
Classification
Genes may overlap in a variety of ways and can be classified by their positions relative to each other.[3][11][12][13][14]
Unidirectional or tandem overlap: the 3' end of one gene overlaps with the 5' end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end.
Convergent or end-on overlap: the 3' ends of the two genes overlap on opposite strands. This can be written as → ←.
Divergent or tail-on overlap: the 5' ends of the two genes overlap on opposite strands. This can be written as ← →.
In-phase overlap occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sites of the same gene.
Out-of-phase overlaps occurs when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a codon is three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame.
Studies on overlapping genes suggest that their evolution can be summarized in two possible models.[4] In one model, the two proteins encoded by their respective overlapping genes evolve under similar selection pressures. The proteins and the overlap region are highly conserved when strong selection against amino acid change is favored. Overlapping genes are reasoned to evolve under strict constraints as a single nucleotide substitution is able to alter the structure and function of the two proteins simultaneously. A study on the hepatitis B virus (HBV), whose DNA genome contains numerous overlapping genes, showed the mean number of synonymous nucleotide substitutions per site in overlapping coding regions was significantly lower than that of non-overlapping regions.[15] The same study showed that it was possible for some of these overlapping regions and their proteins to diverge significantly from the original when there's weak selection against amino acid change. The spacer domain of the polymerase and the pre-S1 region of a surface protein of HBV, for example, had a percentage of conserved amino acids of 30% and 40%, respectively.[15] However, these overlap regions are known to be less important for replication compared to the overlap regions that were highly conserved among different HBV strains, which are absolutely essential for the process.
The second model suggests that the two proteins and their respective overlap genes evolve under opposite selection pressures: one frame experiences positive selection while the other is under purifying selection. In tombusviruses, the proteins p19 and p22 are encoded by overlapping genes that form a 549 nt coding region, and p19 is shown to be under positive selection while p22 is under purifying selection.[16] Additional examples are mentioned in studies involving overlapping genes of the Sendai virus,[17]potato leafroll virus,[18] and human parvovirus B19.[19] This phenomenon of overlapping genes experiencing different selection pressures is suggested to be a consequence of a high rate of nucleotide substitution with different effects on the two frames; the substitutions may be majorly non-synonymous for one frame while mostly being synonymous for the other frame.[4]
Evolution
Overlapping genes are particularly common in rapidly evolving genomes, such as those of viruses, bacteria, and mitochondria. They may originate in three ways:[20]
By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon;
By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon;
By generation of a novel ORF within an existing one due to a point mutation.
The use of the same nucleotide sequence to encode multiple genes may provide evolutionary advantage due to reduction in genome size and due to the opportunity for transcriptional and translationalco-regulation of the overlapping genes.[12][21][22][23] Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.[14][24]
Origins of new genes
In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated de novo by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as overprinting.[25]: 231 It was later substantiated by Susumu Ohno, who identified a candidate gene that may have arisen by this mechanism.[26] Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following gene duplication,[6] contributing to the prevalence of orphan genes. Which member of an overlapping gene pair is younger can be identified bioinformatically either by a more restricted phylogenetic distribution, or by less optimized codon usage.[9][27][28] Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap.[27] Overlaps are more likely to originate in proteins that already have high disorder.[27]
Taxonomic distribution
Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.
Viruses
The existence of overlapping genes was first identified in the virus ΦX174, whose genome was the first DNA genome ever sequenced by Frederick Sanger in 1977.[29] Previous analysis of ΦX174, a small single-stranded DNA bacteriophage that infected the bacteria Escherichia coli, suggested that the proteins produced during infection required coding sequences longer than the measured length of its genome.[31] Analysis of the fully sequenced 5386 nucleotide genome showed that the virus possessed extensive overlap between coding regions, revealing that some genes (like genes D and E) were translated from the same DNA sequences but in different reading frames.[29][31] An alternative start site within the genome replication gene A of ΦX174 was shown to express a truncated protein with an identical coding sequence to the C-terminus of the original A protein but possessing a different function[32][33] It was concluded that other undiscovered sites of polypeptide synthesis could be hidden through the genome due to overlapping genes. An identified de novo gene of another overlapping gene locus was shown to express a novel protein that induces lysis of E. coli by inhibiting biosynthesis of its cell wall[56], suggesting that de novo protein creation through the process of overprinting can be a significant factor in the evolution of pathogenicity of viruses.[4] Another example is the ORF3d gene in the SARS-CoV 2 virus.[1][34] Overlapping genes are particularly common in viral genomes.[9] Some studies attribute this observation to selective pressure toward small genome sizes mediated by the physical constraints of packaging the genome in a viral capsid, particularly one of icosahedral geometry.[35] However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes.[36] Overprinting is a common source of de novo genes in viruses.[28]
The proportion of viruses with overlapping coding sequences within their genomes varies.[2] Double-stranded RNA viruses have fewer than a quarter that contains them while almost three-quarters of retroviridae and viruses with single-stranded DNA genomes contain overlapping coding sequences.[37] Segmented viruses in particular, or viruses with their genome split into separate pieces and packaged either all in the same capsid or in separate capsids, are more likely to contain an overlapping sequence than non-segmented viruses.[37] RNA viruses have fewer overlapping genes than DNA viruses which possess lower mutation rates and less restrictive genome sizes.[37][38] The lower mutation rate of DNA viruses facilitates greater genomic novelty and evolutionary exploration within a structurally constrained genome and may be the primary driver of the evolution of overlapping genes.[39][40]
Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to pathogenicity. Overprinted proteins often have unusual amino acid distributions and high levels of intrinsic disorder.[41] In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures;[42] one example is the RNA silencing suppressor p19 found in Tombusviruses, which has both a novel protein fold and a novel binding mode in recognizing siRNAs.[28][30][43]
Prokaryotes
Estimates of gene overlap in bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs.[12][44][45] Most studies of overlap in bacterial genomes find evidence that overlap serves a function in gene regulation, permitting the overlapped genes to be transcriptionally and translationally co-regulated.[12][23] In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation.[12][14][11] Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2.[45][46] Long overlaps of greater than 60 base pairs are more common for convergent genes; however, putative long overlaps have very high rates of misannotation.[47] Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied model organismEscherichia coli, only four gene pairs are well validated as having long, overprinted overlaps.[48]
Eukaryotes
Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging.[28] However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans.[49][50][51][52] Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common.[50] Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved.[51][53] Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage.[51][54][55]
Function
The precise functions of overlapping genes seems to vary across the domains of life but several experiments have shown that they are important for virus lifecycles through proper protein expression and stoichiometry [56] as well as playing a role in proper protein folding.[57] A version of bacteriophageΦX174 has also been created where all gene overlaps were removed [58] proving they were not necessary for replication.
The retention and evolution of overlapping genes within viruses may also be due to capsid size limitations.[59] Dramatic viability loss was observed in viruses with genomes engineered to be longer than the wild-type genome.[60] Increasing the single-stranded DNA genome length of ΦX174 by >1% results in almost complete loss of infectivity, believed to be the result of the strict physical constraints imposed by the finite capsid volume.[61] Studies on adeno-associated viruses as gene deliveryvectors showed that viral packaging is constrained by genetic cargo size limits, requiring the use of multiple vectors to deliver large human genes such as CFTR81.[62][63] Therefore, it is suggested that overlapping genes evolved as a means to overcome these physical constraints, increasing genetic diversity by utilizing only the existing sequence rather than increasing genome length.
Methods in identifying overlapping genes and ORFs
Standardized methods such as genome annotation may be inappropriate for the detection of overlapping genes as they are reliant on already curated genes while overlapping genes are generally overlooked contain atypical sequence composition.[2][64][65][66] Genome annotation standards are also often biased against feature overlaps, such as genes entirely contained within another gene.[67] Furthermore, some bioinformatics pipelines such as the RAST pipeline markedly penalizes overlaps between predicted ORFs.[68] However, rapid advancement of genome-scale protein and RNA measurement tools along with increasingly advanced prediction algorithms have revealed an avalanche of overlapping genes and ORFs within numerous genomes.[2]Proteogenomic methods have been essential in discovering numerous overlapping genes and include a combination of techniques such as bottom-up proteomics, ribosome profiling, DNA sequencing, and perturbation. RNA sequencing is also used to identify genomic regions containing overlapping transcripts. It has been utilized to identify 180,000 alternate ORFs within previously annotated coding regions found in humans.[69] Newly discovered ORFs such as these are verified using a variety of reverse genetics techniques, such as CRISPR-Cas9 and catalytically dead Cas9 (dCas9) disruption.[70][71][72] Attempts at proof-by-synthesis are also performed to show beyond doubt the absence of any undiscovered overlapping genes.[73]
^ abcdRogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan I, Tatusov RL, Koonin EV (May 2002). "Purifying and directional selection in overlapping prokaryotic genes". Trends in Genetics. 18 (5): 228–232. doi:10.1016/S0168-9525(02)02649-5. PMID12047938.
American actor, comedian and writer (born 1965) For other people named Ian Roberts, see Ian Roberts (disambiguation). This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Ian Roberts American actor – news · newsp...
3 cent stamp, 1866 St Thomas was a hub of the West Indies packet service from 1851 to 1885. Initially mail was transported by a Spanish packet to and from Puerto Rico; but in July 1867 the British picked up the mail contract, and packet letters are known using British stamps as late as 1879. First stamps The first postage stamp of the Danish West Indies was issued in 1856. It had the same square coat of arms design as the contemporary stamps of Denmark, but it was denominated 3 cents and of a...
Very long-chain polyethylene with high impact strength This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Ultra-high-molecular-weight polyethylene – news · newspapers · books · scholar · JSTOR (November 2018) (Learn how and when to remove this template message) Ultra-high-molecular-weight polyethylene (UHMWPE, ...
Region di Tanzania Region adalah wilayah administratif tingkat satu di Tanzania. No. Region Ibu kota Kode Jumlahdistrik Luas(km²) Populasi(sensus 2012) 1 Arusha Arusha TZ-01 7 37,576 1,694,310 2 Dar es Salaam Dar es Salaam TZ-02 5 1,393 4,364,541 3 Dodoma Dodoma TZ-03 7 41,311 2,083,588 4 Geita Geita TZ-27 5 20,054 1,739,530 5 Iringa Iringa TZ-04 5 35,503 941,238 6 Kagera Bukoba TZ-05 8 25,265 2,458,023 7 Katavi Mpanda TZ-28 3 45,843 564,604 8 Kigoma Kigoma TZ-08 8 37,040 2,127,930 9 Kiliman...
American politician Joseph O. Rogers Jr.Rogers in the 1950s/1960sMember of the South Carolina House of RepresentativesIn office1955–1966United States Attorney for the District of South CarolinaIn office1969–1970PresidentRichard NixonPreceded byTerrell L. Glenn Sr. Personal detailsBorn(1921-10-08)October 8, 1921Mullins, South Carolina, U.S.DiedApril 6, 1999(1999-04-06) (aged 77)Political partyDemocratic[1]Republican[2]Alma materCollege of CharlestonThe CitadelUniversit...
Château d'AnetAnet, Eure-et-Loir, Centre-Val de Loire, France Château d'AnetCoordinates48°51′31″N 1°26′18″E / 48.85861°N 1.43833°E / 48.85861; 1.43833TypechâteauSite informationWebsitehttps://www.chateaudanet.comSite historyBuilt1549 Anet in the 18th century The Château d'Anet is a château near Dreux, in the Eure-et-Loir department in northern France, built by Philibert de l'Orme from 1547 to 1552[1] for Diane de Poitiers, the mistress ...
Test signal in television broadcasting This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Test card – news · newspapers · books · scholar · JSTOR (March 2020) (Learn how and when to remove this template message) Test pattern redirects here. For other uses, see Test Pattern (disambiguation). SMPTE color bars: co...
2016 British Cold War espionage film by Shamim Sarif Despite the Falling SnowDirected byShamim SarifScreenplay byShamim SarifBased onDespite the Falling Snowby Shamim SarifProduced byHanan KattanStarring Rebecca Ferguson Sam Reid Charles Dance Antje Traue Oliver Jackson-Cohen Thure Lindhardt Anthony Head CinematographyDavid JohnsonEdited byMasahiro HirakuboMusic byRachel PortmanProductioncompanyEnlightenment ProductionsDistributed byAltitude Film DistributionRelease date 15 April 20...
For the 2001 video game by Circus, see Suika (2001 video game). 2021 Japanese puzzle game 2021 video gameSuika GameOfficial artwork for the game, showcasing a variety of fruits available in-gameDeveloper(s)Aladdin X[a][b]Publisher(s)Aladdin XEngineUnityPlatform(s)Nintendo SwitchReleaseJP: December 9, 2021WW: October 20, 2023Genre(s)PuzzleMode(s)Single-player Suika Game,[c] also known as Watermelon Game, is a Japanese puzzle video game developed and published by Aladdin...
Reprezentacja Marianów Północnych w piłce nożnej mężczyzn Flaga Marianów Północnych Przydomek Blue Ayuyus Związek Northern Mariana Islands Football Association Trener Michiteru Mita Skrót FIFA NMI Zawodnicy Kapitan Lucas Paul Knecht Najwięcej występów Nicolas Swaim (17) Najwięcej bramek Joe Wang Miller (3) Strojedomowe Strojewyjazdowe Mecze Pierwszy mecz Mariany Północne 1–2 Guam (Koror, Palau; 30 lipca 1998) Najwyższe zwycięstwo Mariany Północne 12–1 Palau ...
American journalist In this Chinese name, the family name is Lee and Chin is a generation name. Chin Yang Lee黎錦揚Lee in c. 1957PronunciationLí JǐnyángBorn(1915-12-23)December 23, 1915Xiangtan, Hunan, Republic of ChinaDiedNovember 8, 2018(2018-11-08) (aged 102)Los Angeles, California, U.S.CitizenshipAmericanEducationMaster of Fine ArtsBachelor of ArtsAlma materYale University National Southwestern Associated UniversityKnown forThe Flower Drum SongSpouse Joyce L...
Short story by Isaac Asimov, Frederik PohlThe Little Man on the SubwayShort story by Isaac Asimov, Frederik PohlCountryUnited StatesLanguageEnglishGenre(s)FantasyPublicationPublished inFantasy BookPublication typePeriodicalPublisherFantasy Publishing Company, Inc.Media typePrint (Magazine, Hardback & Paperback)Publication date1950Chronology — The Hazing The Little Man on the Subway is a fantasy short story by Isaac Asimov and Frederik Pohl, originally published in t...
Protected area in Western AustraliaBeelu National ParkWestern AustraliaIUCN category II (national park) Helena River valley hillsideBeelu National ParkNearest town or cityMundaringCoordinates31°57′16″S 116°08′59″E / 31.95444°S 116.14972°E / -31.95444; 116.14972Established1995Area46.17 km2 (17.8 sq mi)[1]Managing authoritiesDepartment of Environment and ConservationWebsiteBeelu National ParkSee alsoList of protected areas ofWestern Aus...
Japanese visual novel franchise Diabolik Loversディアボリックラヴァーズ(Diaborikku Ravāzu)GenreDrama, mystery, romance[1] GameDiabolik Lovers: Haunted Dark BridalDeveloperRejetPublisherIdea FactoryGenreVisual novelPlatformPSPPS Vita (Limited V Edition)PlayStation 4ReleasedJP: October 11, 2012 JP: December 19, 2013 (PS Vita) GameDiabolik Lovers: More,BloodDeveloperRejetPublisherIdea FactoryGenreVisual novelPlatformPSPPS Vita / PS TV (Limited V Edition)PlayStation 4Released...
American politician (born 1953) Dave CampChair of the House Ways and Means CommitteeIn officeJanuary 3, 2011 – January 3, 2015Preceded bySander LevinSucceeded byPaul RyanMember of theU.S. House of Representativesfrom MichiganIn officeJanuary 3, 1991 – January 3, 2015Preceded byBill SchuetteSucceeded byJohn MoolenaarConstituency10th district (1991–1993)4th district (1993–2015)Member of the Michigan House of Representativesfrom the 102nd districtIn officeJa...
University in Japan Nanzan University南山大学Former nameNanzan College of Foreign Languages (1946-1949)MottoHominis Dignitati (人間の尊厳のために, Ningen no Songen no Tame ni)Motto in EnglishFor human dignityTypePrivate Roman Catholic Research Non-profit Coeducational Basic and Higher education institutionEstablished1946FounderFr. Joseph Reiners, SVDReligious affiliationRoman Catholic(Divine Word Missionaries)Academic affiliationsASEACCU ACUCAPresidentRev.Fr.Robert Kisala, S...
American journalist Denver NicksNicks in January 2012Bornc. 1985 (age 37–38)Tulsa, Oklahoma, U.S.CitizenshipUnited StatesAlma materSouthern Methodist UniversityColumbia University Graduate School of Journalism Denver Morrissey Nicks (born c. 1985) is an American journalist, photographer and a staff writer for Time magazine.[1] Nicks' work has appeared in The Nation, The Huffington Post, This Land, and The Daily Beast. He is the author of Private: Bradley Manning, WikiL...
United States historic placeGordon Tract Archeological SiteU.S. National Register of Historic PlacesU.S. Historic district Grindstone Nature AreaNearest cityColumbia, MissouriArea105 acres (42 ha)NRHP reference No.72000705[1]Added to NRHPMarch 16, 1972 The Gordon Tract is a late Woodland period archeological site located on the floodplain and bluffs of Hinkson Creek near Columbia, Missouri, United States, which contains the remains of a prehistoric village and mounds. R...