GPT-1

Generative Pre-trained Transformer 1 (GPT-1)
Original author(s)OpenAI
Initial releaseJune 2018; 6 years ago (June 2018)
Repository
SuccessorGPT-2
Type
LicenseMIT[1]
Websiteopenai.com/blog/language-unsupervised/ Edit this on Wikidata
Original GPT architecture

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017.[2] In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training",[3] in which they introduced that initial model along with the general concept of a generative pre-trained transformer.[4]

Up to that point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;[3][5] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.[5] In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.[3]

The use of a transformer architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".[3]

Reason for choosing BookCorpus

BookCorpus was chosen as a training dataset partly because the long passages of continuous text helped the model learn to handle long-range information.[6] It contained over 7,000 unpublished fiction books from various genres. The rest of the datasets available at the time, while being larger, lacked this long-range structure (being "shuffled" at a sentence level).[3]

The BookCorpus text was cleaned by the ftfy library to standardized punctuation and whitespace and then tokenized by spaCy.[3]

Architecture

The GPT-1 architecture was a twelve-layer decoder-only transformer, using twelve masked self-attention heads, with 64-dimensional states each (for a total of 768). Rather than simple stochastic gradient descent, the Adam optimization algorithm was used; the learning rate was increased linearly from zero over the first 2,000 updates to a maximum of 2.5×10−4, and annealed to 0 using a cosine schedule.[3] GPT-1 has 117 million parameters.[4]

While the fine-tuning was adapted to specific tasks, its pre-training was not; to perform the various tasks, minimal changes were performed to its underlying task-agnostic model architecture.[3] Despite this, GPT-1 still improved on previous benchmarks in several language processing tasks, outperforming discriminatively-trained models with task-oriented architectures on several diverse tasks.[3]

Performance and evaluation

GPT-1 achieved a 5.8% and 1.5% improvement over previous best results[3] on natural language inference (also known as textual entailment) tasks, evaluating the ability to interpret pairs of sentences from various datasets and classify the relationship between them as "entailment", "contradiction" or "neutral".[3] Examples of such datasets include QNLI (Wikipedia articles) and MultiNLI (transcribed speech, popular fiction, and government reports, among other sources);[7] It similarly outperformed previous models on two tasks related to question answering and commonsense reasoning—by 5.7% on RACE,[8] a dataset of written question-answer pairs from middle and high school exams, and by 8.9% on the Story Cloze Test.[9]

GPT-1 improved on previous best-performing models by 4.2% on semantic similarity (or paraphrase detection), evaluating the ability to predict whether two sentences are paraphrases of one another, using the Quora Question Pairs (QQP) dataset.[3]

GPT-1 achieved a score of 45.4, versus a previous best of 35.0[3] in a text classification task using the Corpus of Linguistic Acceptability (CoLA). Finally, GPT-1 achieved an overall score of 72.8 (compared to a previous record of 68.9) on GLUE, a multi-task test.[10]

References

  1. ^ "gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.
  2. ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  3. ^ a b c d e f g h i j k l m Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  4. ^ a b "GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared". 11 April 2023. Archived from the original on 2023-04-15. Retrieved 2023-04-29.
  5. ^ a b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.
  6. ^ Zhu, Yukun; Kiros, Ryan; Zemel, Richard; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (22 June 2015). "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". arXiv:1506.06724 [cs.CV]. # of books: 11,038 / # of sentences: 74,004,228 / # of words: 984,846,357 / mean # of words per sentence: 13 / median # of words per sentence: 11
  7. ^ Williams, Adina; Nangia, Nikita; Bowman, Samuel (1 June 2018). "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference" (PDF). Association for Computational Linguistics. Archived (PDF) from the original on 11 February 2020. Retrieved 23 January 2021. At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), [...] offering data from ten distinct genres of written and spoken English [...] while supplying an explicit setting for evaluating cross-genre domain adaptation.
  8. ^ Lai, Guokun; Xie, Qizhe; Hanxiao, Liu; Yang, Yiming; Hovy, Eduard (15 April 2017). "RACE: Large-scale ReAding Comprehension Dataset From Examinations". arXiv:1704.04683 [cs.CL].
  9. ^ Mostafazadeh, Nasrin; Roth, Michael; Louis, Annie; Chambers, Nathanael; Allen, James F. (3 April 2017). "LSDSem 2017 Shared Task: The Story Cloze Test" (PDF). Association for Computational Linguistics. Archived (PDF) from the original on 22 November 2020. Retrieved 23 January 2021. The LSDSem'17 shared task is the Story Cloze Test, a new evaluation for story understanding and script learning. This test provides a system with a four-sentence story and two possible endings, and the system must choose the correct ending. Successful narrative understanding (getting closer to human performance of 100%) requires systems to link various levels of semantics to commonsense knowledge.
  10. ^ Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omar; Bowman, Samuel R. (20 April 2018). "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding". arXiv:1804.07461 [cs.CL].

Read other articles:

تاريخ الأدب حسب الحقبة العصر البرونزي سومري مصري قديم أكدي الكلاسيكي أبستاقي صيني إغريقي عبري لاتيني بالي براكريتي سنسكريتي سرياني تاميلي أوائل القرون الوسطى صحف روما صحف فرنسا صحف بريطانيا أرميني بيزنطي جورحي كنادي فارسي وسيط تركي قديم القرون الوسطى أنجلوساكسوني إنجليز

 

262-я стрелковая дивизия Вооружённые силы ВС СССР Вид вооружённых сил сухопутные Род войск (сил) пехота Почётные наименования «Демидовская» «Хинганская» Формирование 15 июля 1941 года Расформирование (преобразование) сентябрь 1946 года Награды Районы боевых действий Великая...

 

Campaign of Ronald Reagan for the election to President of the United States in 1976 For his 1980 and 1984 campaign, see Ronald Reagan 1980 presidential campaign and Ronald Reagan 1984 presidential campaign. Ronald Reagan for President 1976Campaign1976 United States presidential electionCandidateRonald ReaganGovernor of California(1967–1975)AffiliationRepublican PartyStatusAnnounced November 20, 1975[1]Lost nomination August 18, 1976[2]Key peopleJohn Sears (campaign man...

Journal focused on the topics of Baltic region Academic journalJournal of Baltic StudiesDisciplineBaltic StudiesLanguageEnglishPublication detailsHistory1970 - presentPublisherRoutledgeFrequencyQuarterlyStandard abbreviationsISO 4 (alt) · Bluebook (alt1 · alt2)NLM (alt) · MathSciNet (alt )ISO 4J. Balt. Stud.IndexingCODEN (alt · alt2) · JSTOR (alt) · LCCN (alt)MIAR · NLM (alt) · Scop...

 

Stasiun Palmerah R02 KRL memasuki jalur 2 Stasiun Palmerah.LokasiJalan Palmerah TimurGelora, Tanah Abang, Jakarta Pusat, DKI Jakarta 10270IndonesiaKetinggian+13 mOperatorKAI CommuterLetak dari pangkalkm 10+116 lintas Angke–Tanah Abang–Rangkasbitung–Merak[1]Jumlah peronDua peron sisi tinggiJumlah jalur2: jalur 1: sepur lurus arah Serpong-Rangkasbitung jalur 2: sepur lurus arah Tanah Abang Informasi lainKode stasiunPLM0217[2]KlasifikasiBesar tipe C[2]SejarahDibuka1...

 

  لمعانٍ أخرى، طالع ستالينغراد (توضيح). ستالينغرادСталинград (بالروسية) معلومات عامةالتصنيف فيلم ثلاثي الأبعاد الصنف الفني دراما، حربالمواضيع الحرب العالمية الثانية — معركة ستالينغراد تاريخ الصدور 2013مدة العرض 135 دقيقة اللغة الأصلية الروسية والألمانيةالبلد روسياالج

This article needs a plot summary. Please add one in your own words. (June 2020) (Learn how and when to remove this template message) 2011 Norwegian filmMagic Silver IISlovak film posterNorwegianBlåfjell 2: Jakten på det magiske horn Directed byArne Lindtner NæssWritten byThomas MoldestadGudny HagenProduced byJørgen Storm RosenbergLasse Greve AlsosStarringAne Viola SembJohan Tinus Austad LindgrenToralv MaurstadElsa LystadPer Christian EllefsenStig-Werner MoeSimon AndersenGeir MorstadDistr...

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Nicolás Leoz – news · newspapers · books · scholar · JSTOR (August 2019) (Learn how and when to remove this template message) In this Spanish name, the first or paternal surname is Leoz and the second or maternal family name is Almirón. Nicolás LeozL...

 

Overview of the events of 1612 in art Overview of the events of 1612 in art List of years in art (table) … 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 … Art Archaeology Architecture Literature Music Philosophy Science +... Events from the year 1612 in art. Events (unknown) Paintings Gentileschi – Judith Beheading Holofernes Cornelis Engelsz – The St Adrian Civic Guard Artemisia Gentileschi - Judith Slaying Holofernes (first ...

2008 Malaysian filmKamiTheatrical release posterDirected byEffendee Mazlan Fariza Azlina IsahakWritten byFariza Azlina IsahakProduced byLina TanStarringLiyana JasmaySyarul Ezani Mohamed EzzuddeenNas-TAni Zayanah IbrahimJuliana Sophie EvansZahiril AdzimDistributed byGrand BrillianceRelease date 1 October 2008 (2008-10-01) Running time100 minutesCountryMalaysiaLanguageMalayBudgetRM 1.4 millionBox officeRM 1,245,000 Kami is a 2008 Malaysian Malay-language drama film directed by Ef...

 

Indian politician This article needs to be updated. Please help update this to reflect recent events or newly available information. (January 2021) In this Indian name, the name Muthuvel Karunanidhi is a patronymic, and the person should be referred to by the given name, Alagiri. M. K. AlagiriAlagiri in June 2009Minister of Chemicals and FertilizersIn office28 May 2009 – 21 March 2013PresidentPratibha PatilPranab MukherjeePrime MinisterManmohan SinghPreceded byRam Vilas PaswanSucce...

 

Austronesian language spoken in Taiwan Truku language redirects here. For the Micronesian language named Trukese, see Chuukese language. Kari SeediqTarokoNative toTaiwanRegioncentral, eastern, and coastalEthnicitySeediq, TarokoNative speakers20,000 (2008)[1]Language familyAustronesian AtayalicKari SeediqLanguage codesISO 639-3trvGlottologtaro1264ELPSeediqLinguasphere30-AABTaroko is classified as Vulnerable by the UNESCO Atlas of the World's Languages in Danger Seediq, also k...

Former Merchant Navy radio college 54°24′01″N 02°57′51″W / 54.40028°N 2.96417°W / 54.40028; -2.96417 The Badge of RMS Wray Castle RMS Wray Castle was a training college for Merchant Navy radio officers based at Wray Castle in the Lake District, from 1958 to 1998.[citation needed] At 11:40 p.m., on 14 April 1912 the RMS Titanic hit an iceberg. The collision opened five of her watertight compartments to the sea; the ship gradually filled wit...

 

Czech and Slovak film aggregate website This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: Czech Movie Heaven – news · newspapers · books · scholar · JSTOR (April 2019) (Learn how and when to remove this template message) Czech Movie Heaven (in Czech České filmové nebe, ČFN) was founded in 1995 by Radek Vetešník and Petr Herudek to be a comprehensive...

 

Ini adalah nama Melayu; nama Sidek merupakan patronimik, bukan nama keluarga, dan tokoh ini dipanggil menggunakan nama depannya, Fadhlina. Kata bin (b.) atau binti (bt.), jika digunakan, berarti putra dari atau putri dari. Yang Berhormat PuanFadhlina SidekAPفضلينا صديقMenteri Pendidikan MalaysiaPetahanaMulai menjabat 3 Desember 2022Perdana MenteriAnwar IbrahimPendahuluMohd Radzi Md Jidin Informasi pribadiLahirFadhlina binti Siddiq16 Oktober 1977 (umur 46)Pulau Pinang, Malay...

Railway station in Ama, Aichi Prefecture, Japan This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Kida Station – news · newspapers · books · scholar · JSTOR (April 2012) (Learn how and when to remove this template message) Kida Station木田駅Kida Station North Gate November 2018General informationLocationMichishita-54-2 Kida,...

 

Tonaufnahme mit Windschutz Unter einer Tonaufnahme (auch: Tonaufzeichnung) versteht man in der Tontechnik die Aufzeichnung von Schall, also von Geräuschen, Tönen, Musik und Sprache mit Hilfe von Audiorekordern zur späteren Wiedergabe. Technisch betrachtet ist die Tonaufnahme der erste Teil einer Tonsignal-Verarbeitungskette in der Tontechnik, wobei der Aufzeichnung die Speicherung und Wiedergabe folgt. Inhaltsverzeichnis 1 Allgemeines 2 Abgrenzung 3 Geschichte 4 Heutige Aufnahmeverfahren 4...

 

African television series This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Pop Up Party – news · newspapers · books · scholar · JSTOR (September 2016) (Lear...

Indian Malayalam-language television channel Television channel Jaihind TVCountryIndiaBroadcast areaIndiaHeadquartersThiruvananthapuram, KeralaProgrammingPicture format576i SDTVOwnershipOwnerBharat Broadcasting Network LtdKey peopleRamesh Chennithala(Chairman)HistoryLaunched17 August 2007; 16 years ago (2007-08-17)LinksWebsitewww.jaihindtv.in Jaihind TV is an Indian Malayalam language free to air news and entertainment channel. owned by Bharat Broadcasting Network Limited. i...

 

Pecut Samandiman adalah pusaka berupa cambuk yang berasal dari Ponorogo, Jawa Timur yang dimiliki oleh Raja Klono Sewandono serta memiliki kesaktian untuk mengalahkan lawannya, Singo Barong.[1] Sejarah Raja Klono Sewandono memegang Pecut Samandiman dihadapan raja Singo Barong Semua bermula Pasukan Kerajaan Bantarangin dipukul mundur oleh raja Singo Barong dari Kerajaan Lodaya, termasuk raja Klono Sewandono kewalahan mengahadapi raja Singo Barong karena merupakan Senior dalam perguruan...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!