The number of non-coding RNAs within the human genome is unknown; however, recent transcriptomic and bioinformatic studies suggest that there are thousands of non-coding transcripts.[1][2][3][4][5][6][7]
Many of the newly identified ncRNAs have unknown functions, if any.[8]
There is no consensus on how much of non-coding transcription is functional: some believe most ncRNAs to be non-functional "junk RNA", spurious transcriptions,[9][10] while others expect that many non-coding transcripts have functions to be discovered.[11][12]
The first non-coding RNA to be characterised was an alanine tRNA found in baker's yeast, its structure was published in 1965.[16] To produce a purified alanine tRNA sample, Robert W. Holleyet al. used 140kg of commercial baker's yeast to give just 1g of purified tRNAAla for analysis.[17] The 80 nucleotide tRNA was sequenced by first being digested with Pancreatic ribonuclease (producing fragments ending in Cytosine or Uridine) and then with takadiastase ribonuclease Tl (producing fragments which finished with Guanosine). Chromatography and identification of the 5' and 3' ends then helped arrange the fragments to establish the RNA sequence.[17] Of the three structures originally proposed for this tRNA,[16] the 'cloverleaf' structure was independently proposed in several following publications.[18][19][20][21] The cloverleaf secondary structure was finalised following X-ray crystallography analysis performed by two independent research groups in 1974.[22][23]
Recent discoveries of ncRNAs have been achieved through both experimental and bioinformatic methods.
Biological roles
Noncoding RNAs belong to several groups and are involved in many cellular processes.[26] These range from ncRNAs of central importance that are conserved across all or most cellular life through to more transient ncRNAs specific to one or a few closely related species. The more conserved ncRNAs are thought to be molecular fossils or relics from the last universal common ancestor and the RNA world, and their current roles remain mostly in regulation of information flow from DNA to protein.[27][28][29]
In translation
Many of the conserved, essential and abundant ncRNAs are involved in translation. Ribonucleoprotein (RNP) particles called ribosomes are the 'factories' where translation takes place in the cell. The ribosome consists of more than 60% ribosomal RNA; these are made up of 3 ncRNAs in prokaryotes and 4 ncRNAs in eukaryotes. Ribosomal RNAs catalyse the translation of nucleotide sequences to protein. Another set of ncRNAs, Transfer RNAs, form an 'adaptor molecule' between mRNA and protein. The H/ACA box and C/D box snoRNAs are ncRNAs found in archaea and eukaryotes. RNase MRP is restricted to eukaryotes. Both groups of ncRNA are involved in the maturation of rRNA. The snoRNAs guide covalent modifications of rRNA, tRNA and snRNAs; RNase MRP cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. The ubiquitous ncRNA, RNase P, is an evolutionary relative of RNase MRP.[31] RNase P matures tRNA sequences by generating mature 5'-ends of tRNAs through cleaving the 5'-leader elements of precursor-tRNAs. Another ubiquitous RNP called SRP recognizes and transports specific nascent proteins to the endoplasmic reticulum in eukaryotes and the plasma membrane in prokaryotes. In bacteria, Transfer-messenger RNA (tmRNA) is an RNP involved in rescuing stalled ribosomes, tagging incomplete polypeptides and promoting the degradation of aberrant mRNA.[citation needed]
In RNA splicing
In eukaryotes, the spliceosome performs the splicing reactions essential for removing intron sequences, this process is required for the formation of mature mRNA. The spliceosome is another RNP often known as the snRNP or tri-snRNP. There are two different forms of the spliceosome, the major and minor forms. The ncRNA components of the major spliceosome are U1, U2, U4, U5, and U6. The ncRNA components of the minor spliceosome are U11, U12, U5, U4atac and U6atac.[citation needed]
Another group of introns can catalyse their own removal from host transcripts; these are called self-splicing RNAs. There are two main groups of self-splicing RNAs: group I catalytic intron and group II catalytic intron. These ncRNAs catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms.[citation needed]
The expression of many thousands of genes are regulated by ncRNAs. This regulation can occur in trans or in cis. There is increasing evidence that a special type of ncRNAs called enhancer RNAs, transcribed from the enhancer region of a gene, act to promote gene expression.[citation needed]
Trans-acting
In higher eukaryotes microRNAs regulate gene expression. A single miRNA can reduce the expression levels of hundreds of genes. The mechanism by which mature miRNA molecules act is through partial complementarity to one or more messenger RNA (mRNA) molecules, generally in 3' UTRs. The main function of miRNAs is to down-regulate gene expression.
The ncRNA RNase P has also been shown to influence gene expression. In the human nucleus, RNase P is required for the normal and efficient transcription of various ncRNAs transcribed by RNA polymerase III. These include tRNA, 5S rRNA, SRP RNA, and U6 snRNA genes. RNase P exerts its role in transcription through association with Pol III and chromatin of active tRNA and 5S rRNA genes.[39]
The bacterial ncRNA, 6S RNA, specifically associates with RNA polymerase holoenzyme containing the sigma70 specificity factor. This interaction represses expression from a sigma70-dependent promoter during stationary phase.[citation needed]
Another bacterial ncRNA, OxyS RNA represses translation by binding to Shine-Dalgarno sequences thereby occluding ribosome binding. OxyS RNA is induced in response to oxidative stress in Escherichia coli.[citation needed]
The B2 RNA is a small noncoding RNA polymerase III transcript that represses mRNA transcription in response to heat shock in mouse
cells. B2 RNA inhibits transcription by binding to core Pol II. Through this interaction, B2 RNA assembles into preinitiation
complexes at the promoter and blocks RNA synthesis.[40]
A recent study has shown that just the act of transcription of ncRNA sequence can have an influence on gene expression. RNA polymerase II transcription of ncRNAs is required for chromatin remodelling in the Schizosaccharomyces pombe. Chromatin is progressively converted to an open configuration, as several species of ncRNAs are transcribed.[41]
A number of ncRNAs are embedded in the 5' UTRs (Untranslated Regions) of protein coding genes and influence their expression in various ways. For example, a riboswitch can directly bind a small target molecule; the binding of the target affects the gene's activity.[citation needed]
RNA leader sequences are found upstream of the first gene of amino acid biosynthetic operons. These RNA elements form one of two possible structures in regions encoding very short peptide sequences that are rich in the end product amino acid of the operon. A terminator structure forms when there is an excess of the regulatory amino acid and ribosome movement over the leader transcript is not impeded. When there is a deficiency of the charged tRNA of the regulatory amino acid the ribosome translating the leader peptide stalls and the antiterminator structure forms. This allows RNA polymerase to transcribe the operon. Known RNA leaders are Histidine operon leader, Leucine operon leader, Threonine operon leader and the Tryptophan operon leader.[citation needed]
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are repeats found in the DNA of many bacteria and archaea. The repeats are separated by spacers of similar length. It has been demonstrated that these spacers can be derived from phage and subsequently help protect the cell from infection.
Chromosome structure
Telomerase is an RNP enzyme that adds specific DNA sequence repeats ("TTAGGG" in vertebrates) to telomeric regions, which are found at the ends of eukaryotic chromosomes. The telomeres contain condensed DNA material, giving stability to the chromosomes. The enzyme is a reverse transcriptase that carries Telomerase RNA, which is used as a template when it elongates telomeres, which are shortened after each replication cycle.
Bifunctional RNAs, or dual-function RNAs, are RNAs that have two distinct functions.[43][44] The majority of the known bifunctional RNAs are mRNAs that encode both a protein and ncRNAs. However, a growing number of ncRNAs fall into two different ncRNA categories; e.g., H/ACA box snoRNA and miRNA.[45][46]
Two well known examples of bifunctional RNAs are SgrS RNA and RNAIII. However, a handful of other bifunctional RNAs are known to exist (e.g., steroid receptor activator/SRA,[47] VegT RNA,[48][49]
Oskar RNA,[50]ENOD40,[51] p53 RNA[52]SR1 RNA,[53] and Spot 42 RNA.[54]) Bifunctional RNAs were the subject of a 2011 special issue of Biochimie.[55]
As a hormone
There is an important link between certain non-coding RNAs and the control of hormone-regulated pathways. In Drosophila, hormones such as ecdysone and juvenile hormone can promote the expression of certain miRNAs. Furthermore, this regulation occurs at distinct temporal points within Caenorhabditis elegans development.[56] In mammals, miR-206 is a crucial regulator of estrogen-receptor-alpha.[57]
Non-coding RNAs are crucial in the development of several endocrine organs, as well as in endocrine diseases such as diabetes mellitus.[58] Specifically in the MCF-7 cell line, addition of 17β-estradiol increased global transcription of the noncoding RNAs called lncRNAs near estrogen-activated coding genes.[59]
As with proteins, mutations or imbalances in the ncRNA repertoire within the body can cause a variety of diseases.
Cancer
Many ncRNAs show abnormal expression patterns in cancerous tissues.[6] These include miRNAs, long mRNA-like ncRNAs,[62][63]GAS5,[64]SNORD50,[65]telomerase RNA and Y RNAs.[66] The miRNAs are involved in the large scale regulation of many protein coding genes,[67][68] the Y RNAs are important for the initiation of DNA replication,[35] telomerase RNA that serves as a primer for telomerase, an RNP that extends telomeric regions at chromosome ends (see telomeres and disease for more information). The direct function of the long mRNA-like ncRNAs is less clear.
It has been suggested that a rare SNP (rs11614913) that overlaps hsa-mir-196a-2 has been found to be associated with non-small cell lung carcinoma.[71] Likewise, a screen of 17 miRNAs that have been predicted to regulate a number of breast cancer associated genes found variations in the microRNAs miR-17 and miR-30c-1of patients; these patients were noncarriers of BRCA1 or BRCA2 mutations, lending the possibility that familial breast cancer may be caused by variation in these miRNAs.[72]
The p53 tumor suppressor is arguably the most important agent in preventing tumor formation and progression. The p53 protein functions as a transcription factor with a crucial role in orchestrating the cellular stress response. In addition to its crucial role in cancer, p53 has been implicated in other diseases including diabetes, cell death after ischemia, and various neurodegenerative diseases such as Huntington, Parkinson, and Alzheimer. Studies have suggested that p53 expression is subject to regulation by non-coding RNA.[5]
Another example of non-coding RNA dysregulated in cancer cells is the long non-coding RNA Linc00707. Linc00707 is upregulated and sponges miRNAs in human bone marrow-derived mesenchymal stem cells,[73] gastric cancer[74] or breast cancer,[75][76] and thus promotes osteogenesis, contributes to hepatocellular carcinoma progression, promotes proliferation and metastasis, or indirectly regulates expression of proteins involved in cancer aggressiveness, respectively.
Prader–Willi syndrome
The deletion of the 48 copies of the C/D box snoRNA SNORD116 has been shown to be the primary cause of Prader–Willi syndrome.[77][78][79][80] Prader–Willi is a developmental disorder associated with over-eating and learning difficulties. SNORD116 has potential target sites within a number of protein-coding genes, and could have a role in regulating alternative splicing.[81]
Autism
The chromosomal locus containing the small nucleolar RNA SNORD115 gene cluster has been duplicated in approximately 5% of individuals with autistic traits.[82][83] A mouse model engineered to have a duplication of the SNORD115 cluster displays autistic-like behaviour.[84] A recent small study of post-mortem brain tissue demonstrated altered expression of long non-coding RNAs in the prefrontal cortex and cerebellum of autistic brains as compared to controls.[85]
Cartilage–hair hypoplasia
Mutations within RNase MRP have been shown to cause cartilage–hair hypoplasia, a disease associated with an array of symptoms such as short stature, sparse hair, skeletal abnormalities and a suppressed immune system that is frequent among Amish and Finnish.[86][87][88] The best characterised variant is an A-to-G transition at nucleotide 70 that is in a loop region two bases 5' of a conservedpseudoknot. However, many other mutations within RNase MRP also cause CHH.
Alzheimer's disease
The antisense RNA, BACE1-AS is transcribed from the opposite strand to BACE1 and is upregulated in patients with Alzheimer's disease.[89] BACE1-AS regulates the expression of BACE1 by increasing BACE1 mRNA stability and generating additional BACE1 through a post-transcriptional feed-forward mechanism. By the same mechanism it also raises concentrations of beta amyloid, the main constituent of senile plaques. BACE1-AS concentrations are elevated in subjects with Alzheimer's disease and in amyloid precursor protein transgenic mice.
miR-96 and hearing loss
Variation within the seed region of mature miR-96 has been associated with autosomal dominant, progressive hearing loss in humans and mice. The homozygous mutant mice were profoundly deaf, showing no cochlear responses. Heterozygous mice and humans progressively lose the ability to hear.[90][91][92]
Distinction between functional RNA (fRNA) and ncRNA
Scientists have started to distinguish functional RNA (fRNA) from ncRNA, to describe regions functional at the RNA level that may or may not be stand-alone RNA transcripts.[97][98][99] This implies that fRNA (such as riboswitches, SECIS elements, and other cis-regulatory regions) is not ncRNA. Yet fRNA could also include mRNA, as this is RNA coding for protein, and hence is functional. Additionally artificially evolved RNAs also fall under the fRNA umbrella term. Some publications[24] state that ncRNA and fRNA are nearly synonymous, however others have pointed out that a large proportion of annotated ncRNAs likely have no function.[9][10] It also has been suggested to simply use the term RNA, since the distinction from a protein coding RNA (messenger RNA) is already given by the qualifier mRNA.[100] This eliminates the ambiguity when addressing a gene "encoding a non-coding" RNA. Besides, there may be a number of ncRNAs that are misannoted in published literature and datasets.[101][102][103]
^Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, et al. (November 2021). "Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology". Briefings in Bioinformatics. 22 (6). doi:10.1093/bib/bbab259. PMID34329375.
^Zachau HG, Dütting D, Feldmann H, Melchers F, Karau W (1966). "Serine specific transfer ribonucleic acids. XIV. Comparison of nucleotide sequences and secondary structure models". Cold Spring Harbor Symposia on Quantitative Biology. 31: 417–424. doi:10.1101/SQB.1966.031.01.054. PMID5237198.
^Espinoza CA, Allen TA, Hieb AR, Kugel JF, Goodrich JA (September 2004). "B2 RNA binds directly to RNA polymerase II to repress transcript synthesis". Nature Structural & Molecular Biology. 11 (9): 822–829. doi:10.1038/nsmb812. PMID15300239. S2CID22199826.
^Zhang J, King ML (December 1996). "Xenopus VegT RNA is localized to the vegetal cortex during oogenesis and encodes a novel T-box transcription factor involved in mesodermal patterning". Development. 122 (12): 4119–4129. doi:10.1242/dev.122.12.4119. PMID9012531. S2CID28462527.
^Pibouin L, Villaudy J, Ferbus D, Muleris M, Prospéri MT, Remvikos Y, Goubin G (February 2002). "Cloning of the mRNA of overexpression in colon carcinoma-1: a sequence overexpressed in a subset of colon carcinomas". Cancer Genetics and Cytogenetics. 133 (1): 55–60. doi:10.1016/S0165-4608(01)00634-3. PMID11890990.
^Fu X, Ravindranath L, Tran N, Petrovics G, Srivastava S (March 2006). "Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene, PCGEM1". DNA and Cell Biology. 25 (3): 135–141. doi:10.1089/dna.2006.25.135. PMID16569192.
^Xie M, Ma T, Xue J, Ma H, Sun M, Zhang Z, et al. (February 2019). "The long intergenic non-protein coding RNA 707 promotes proliferation and metastasis of gastric cancer by interacting with mRNA stabilizing protein HuR". Cancer Letters. 443: 67–79. doi:10.1016/j.canlet.2018.11.032. PMID30502359. S2CID54611497.
^Yuan RX, Bao D, Zhang Y (May 2020). "Linc00707 promotes cell proliferation, invasion, and migration via the miR-30c/CTHRC1 regulatory loop in breast cancer". European Review for Medical and Pharmacological Sciences. 24 (9): 4863–4872. doi:10.26355/eurrev_202005_21175. PMID32432749. S2CID218759508.
^Ding F, Prints Y, Dhar MS, Johnson DK, Garnacho-Montero C, Nicholls RD, Francke U (June 2005). "Lack of Pwcr1/MBII-85 snoRNA is critical for neonatal lethality in Prader-Willi syndrome mouse models". Mammalian Genome. 16 (6): 424–431. doi:10.1007/s00335-005-2460-2. PMID16075369. S2CID12256515.
^Bolton PF, Veltman MW, Weisblatt E, Holmes JR, Thomas NS, Youings SA, et al. (September 2004). "Chromosome 15q11-13 abnormalities and other medical conditions in individuals with autism spectrum disorders". Psychiatric Genetics. 14 (3): 131–137. doi:10.1097/00041444-200409000-00002. PMID15318025. S2CID37344935.
^Mencía A, Modamio-Høybjør S, Redshaw N, Morín M, Mayo-Merino F, Olavarrieta L, et al. (May 2009). "Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss". Nature Genetics. 41 (5): 609–613. doi:10.1038/ng.355. PMID19363479. S2CID11113852.
^Housman G, Ulitsky I (January 2016). "Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 1859 (1): 31–40. doi:10.1016/j.bbagrm.2015.07.017. PMID26265145.