Superscalar processor

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed. (IF = instruction fetch, ID = instruction decode, EX = execute, MEM = memory access, WB = register write-back, i = instruction number, t = clock cycle [i.e. time])
Processor board of a CRAY T3e supercomputer with four superscalar Alpha 21164 processors

A superscalar processor (or multiple-issue processor[1]) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor.[2] In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute or start executing more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time which can even be less than 1) than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor (or a core if the processor is a multi-core processor), but an execution resource within a single CPU such as an arithmetic logic unit.

While a superscalar CPU is typically also pipelined, superscalar and pipelining execution are considered different performance enhancement techniques. The former (superscalar) executes multiple instructions in parallel by using multiple execution units, whereas the latter (pipeline) executes multiple instructions in the same execution unit in parallel by dividing the execution unit into different phases. In the "Simple superscalar pipeline" figure, fetching two instructions at the same time is superscaling, and fetching the next two before the first pair has been written back is pipelining.

The superscalar technique is traditionally associated with several identifying characteristics (within a given CPU):

  • Instructions are issued from a sequential instruction stream
  • The CPU dynamically checks for data dependencies between instructions at run time (versus software checking at compile time)
  • The CPU can execute multiple instructions per clock cycle

History

Seymour Cray's CDC 6600 from 1964 is often mentioned as the first superscalar design. The 1967 IBM System/360 Model 91 was another superscalar mainframe. The Intel i960CA (1989),[3] the AMD 29000-series 29050 (1990), and the Motorola MC88110 (1991),[4] microprocessors were the first commercial single-chip superscalar microprocessors. RISC microprocessors like these were the first to have superscalar execution, because RISC architectures free transistors and die area which can be used to include multiple execution units and the traditional uniformity of the instruction set favors superscalar dispatch (this was why RISC designs were faster than CISC designs through the 1980s and into the 1990s, and it's far more complicated to do multiple dispatch when instructions have variable bit length).

Except for CPUs used in low-power applications, embedded systems, and battery-powered devices, essentially all general-purpose CPUs developed since about 1998 are superscalar.

The P5 Pentium was the first superscalar x86 processor; the Nx586, P6 Pentium Pro and AMD K5 were among the first designs which decode x86-instructions asynchronously into dynamic microcode-like micro-op sequences prior to actual execution on a superscalar microarchitecture; this opened up for dynamic scheduling of buffered partial instructions and enabled more parallelism to be extracted compared to the more rigid methods used in the simpler P5 Pentium; it also simplified speculative execution and allowed higher clock frequencies compared to designs such as the advanced Cyrix 6x86.

Scalar to superscalar

The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is a mixture of the two. Each instruction processes one data item, but there are multiple execution units within each CPU thus multiple instructions can be processing separate data items concurrently.

Superscalar CPU design emphasizes improving the instruction dispatcher accuracy and allowing it to keep the multiple execution units in use at all times. This has become increasingly important as the number of units has increased. While early superscalar CPUs would have two ALUs and a single FPU, a later design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will be no better than that of a simpler, cheaper design.

A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle. But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined, multiprocessor or multi-core architectures also achieve that, but with different methods.

In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching each to one of the several execution units contained inside a single CPU. Therefore, a superscalar processor can be envisioned as having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread.

Most modern superscalar CPUs also have logic to reorder the instructions to try to avoid pipeline stalls and increase parallel execution.

Limitations

Available performance improvement from superscalar techniques is limited by three key areas:

  • The degree of intrinsic parallelism in the instruction stream (instructions requiring the same computational resources from the CPU)
  • The complexity and time cost of dependency checking logic and register renaming circuitry
  • The branch instruction processing

Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units.

Although the instruction stream may contain no inter-instruction dependencies, a superscalar CPU must nonetheless check for that possibility, since there is no assurance otherwise and failure to detect a dependency would produce incorrect results.

No matter how advanced the semiconductor process or how fast the switching speed, this places a practical limit on how many instructions can be simultaneously dispatched. While process advances will allow ever greater numbers of execution units (e.g. ALUs), the burden of checking instruction dependencies grows rapidly, as does the complexity of register renaming circuitry to mitigate some dependencies. Collectively the power consumption, complexity and gate delay costs limit the achievable superscalar speedup.

However even given infinitely fast dependency checking logic on an otherwise conventional superscalar CPU, if the instruction stream itself has many dependencies, this would also limit the possible speedup. Thus the degree of intrinsic parallelism in the code stream forms a second limitation.

Alternatives

Collectively, these limits drive investigation into alternative architectural changes such as very long instruction word (VLIW), explicitly parallel instruction computing (EPIC), simultaneous multithreading (SMT), and multi-core computing.

With VLIW, the burdensome task of dependency checking by hardware logic at run time is removed and delegated to the compiler. Explicitly parallel instruction computing (EPIC) is like VLIW with extra cache prefetching instructions.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar processors. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. The fact that they are independent means that we know that the instruction of one thread can be executed out of order and/or in parallel with the instruction of a different one. Also, one independent thread will not produce a pipeline bubble in the code stream of a different one, for example, due to a branch.

Superscalar processors differ from multi-core processors in that the several execution units are not entire processors. A single processor is composed of finer-grained execution units such as the ALU, integer multiplier, integer shifter, FPU, etc. There may be multiple versions of each execution unit to enable the execution of many instructions in parallel. This differs from a multi-core processor that concurrently processes instructions from multiple threads, one thread per processing unit (called "core"). It also differs from a pipelined processor, where the multiple instructions can concurrently be in various stages of execution, assembly-line fashion.

The various alternative techniques are not mutually exclusive—they can be (and frequently are) combined in a single processor. Thus a multicore CPU is possible where each core is an independent processor containing multiple parallel pipelines, each pipeline being superscalar. Some processors also include vector capability.

See also

References

  1. ^ P. Pacheco, Introduction to Parallel Programming, 2011, section 2.2.5, "There are two main approaches to ILP: pipelining ... and multiple issue ... A processor that supports dynamic multiple issue is sometimes said to be superscalar." A. Chien, Computer Architecture for Scientists, 2022, page 102, "multiple-issue (aka superscalar)".
  2. ^ "What is a Superscalar Processor? - Definition from Techopedia". Techopedia.com. 28 February 2019. Retrieved 2022-08-29.
  3. ^ McGeady, Steven (Spring 1990). "The i960CA SuperScalar implementation of the 80960 architecture". Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage. pp. 232–240. doi:10.1109/CMPCON.1990.63681. ISBN 0-8186-2028-5. S2CID 13206773. {{cite book}}: |journal= ignored (help)
  4. ^ Diefendorff, K.; Allen, M. (Spring 1992). "The Motorola 88110 Superscalar RISC microprocessor". Digest of Papers COMPCON Spring 1992. pp. 157–162. doi:10.1109/CMPCON.1992.186702. ISBN 0-8186-2655-0. S2CID 34913907. {{cite book}}: |journal= ignored (help)
  • Mike Johnson, Superscalar Microprocessor Design, Prentice-Hall, 1991, ISBN 0-13-875634-1
  • Sorin Cotofana, Stamatis Vassiliadis, "On the Design Complexity of the Issue Logic of Superscalar Machines", EUROMICRO 1998: 10277-10284
  • Steven McGeady, et al., "Performance Enhancements in the Superscalar i960MM Embedded Microprocessor," ACM Proceedings of the 1991 Conference on Computer Architecture (Compcon), 1991, pp. 4–7

Read other articles:

Defunct political party in England This article is about the former UK political party. For raves without legal restrictions, see Free party. J. R. Bob Dobbs, whose image was the party logo The Free Party was a minor political party in the United Kingdom. They were founded to promote the free party scene during the 2001 general election. They stood candidates for the three Parliamentary seats within the city of Brighton and Hove, under names associated with the Church of the SubGenius. They p...

 

2023 single by Take ThatWindowsSingle by Take Thatfrom the album This Life Released22 September 2023Recorded2023Length3:58LabelEMISongwriter(s) Gary Barlow Mark Owen Howard Donald Producer(s)Dave CobbTake That singles chronology Greatest Day (Robin Schulz Rework) (2023) Windows (2023) Brand New Sun (2023) Music videosWindows on YouTubeWindows (Acoustic) on YouTube Windows is a song by the British pop group Take That. It was released by EMI Records on 22 September 2023 as the first single from...

 

Santi Palacios Santi Palacios en 2018Información personalNacimiento 1985MadridNacionalidad EspañolaInformación profesionalOcupación FotoperiodistaSitio web santipalacios.com [editar datos en Wikidata] Santi Palacios (Madrid, 1985) es un reportero gráfico independiente.[1]​ Especializado en el área internacional, documenta migraciones, fronteras, contaminación, cambio climático y otros aspectos vinculados a la ecología humana. Su trabajo ha sido publicado en los princi...

Olympus Has Fallen Título Objetivo: La Casa Blanca (España)Ataque a la Casa Blanca (Hispanoamérica)Olimpo Bajo Fuego (Colombia y México)Operación: Casa Blanca (Centroamérica y Chile)Ficha técnicaDirección Antoine FuquaProducción Gerard ButlerAlan SiegelMark GillGuion Creighton RothenbergerKatrin BenediktMúsica Trevor MorrisFotografía Conrad W. HallMontaje John RefouaProtagonistas Gerard ButlerAaron EckhartMorgan FreemanAngela BassettRobert ForsterCole HauserFinley JacobsenAshley Ju...

 

Dungeon ni Deai o Motomeru no wa Machigatteiru Darō ka Originaltitel ダンジョンに出会いを求めるのは間違っているだろうか Transkription Danjon ni Deai o Motomeru no wa Machigatteiru Darō ka Genre Fantasy, Abenteuer, Romanze Light Novel Land Japan Japan Autor Fujino Ōmori Illustrator Suzuhito Yasuda Verlag SB Creative Erstpublikation 15. Jan. 2013 –  Ausgaben 15+ Light Novel Titel Dungeon ni Deai o Motomeru no wa Machigatteiru Darō ka Gaiden: ...

 

Wakil Bupati BlitarHurub hambangun praja(Jawa) Semangat membangun negeriPetahanaH. Rahmat Santoso, S.H., M.H.sejak 26 Februari 2021Masa jabatan5 tahunDibentuk2001Pejabat pertamaHerry NoegrohoSitus webwww.blitarkab.go.id Berikut ini adalah daftar Wakil Bupati Blitar dari masa ke masa. No Wakil Bupati Mulai Jabatan Akhir Jabatan Prd. Ket. Bupati 1 H.Herry Noegroho S.E., M.H. 2001 2004 1   Drs. H.Imam Muhadi M.B.A., M.M. Jabatan kosong 2004 2005 H.Herry Noegroho S.E., M.H.(Pejabat Seme...

Philippine Elisabeth Charlotte van Orléans Philippine Elisabeth Charlotte van Orléans, Mademoiselle de Beaujolais, (Versailles, 18 december 1714 — Parijs, 21 mei 1734) was de zesde dochter van hertog Filips II van Orléans en hertogin Françoise Marie van Bourbon. Haar moeder was de jongste buitenechtelijke dochter van koning Lodewijk XIV van Frankrijk en diens beroemde maîtresse Madame de Montespan. Als een lid van het regerende huis Bourbon (en ook als lid van de Orléans-familie) was ...

 

For the Canadian place, see Martock, Nova Scotia. For the Star Trek character, see Martok. Human settlement in EnglandMartockAll Saints' ChurchMartockLocation within SomersetPopulation4,766 (2011)[1]OS grid referenceST463192DistrictSouth SomersetShire countySomersetRegionSouth WestCountryEnglandSovereign stateUnited KingdomPost townMARTOCKPostcode districtTA12Dialling code01935PoliceAvon and SomersetFireDevon and SomersetAmbulanceSouth Western UK...

 

  لمعانٍ أخرى، طالع جامع صقللي محمد باشا (توضيح). جامع صقللي محمد باشا   إحداثيات 41°01′20″N 28°34′34″E / 41.02222222°N 28.57611111°E / 41.02222222; 28.57611111  معلومات عامة الدولة تركيا  الاسم نسبة إلى صقللي محمد باشا  المؤسس صقللي محمد باشا  سنة التأسيس 1566  تاريخ بدء ا...

Владислав Николаевич Диханов — митрофорный протоиерей, председатель Синодального отдела Украинской православной церкви (УПЦ) по социально-гуманитарным вопросам, и. о. председателя Синодального отдела благотворительности и социального служения УПЦ, председатель...

 

Indian actress, dancer and parliamentarian (born 1933) For the garland in Hindu mythology, see Vaijayanti. VyjayanthimalaVyjayanthimala in 2011BornVyjayanthimala Raman (1933-08-13) 13 August 1933 (age 90)[1]Triplicane, Madras Presidency, British India(present-day Thiruvallikeni, Tamil Nadu, India)Occupation(s)Actress, Indian classical dancer, Carnatic singer, politicianYears active1949–1970WorksFull ListSpouse Chamanlal Bali ​ ​(m. 1968; di...

 

This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: American Idol Season 11 Top 10 Highlights – news · newspapers · books · scholar · JSTOR (August 2012) (Learn how and when to remove this template message) 2012 compilation album by Various ArtistsAmerican Idol Season 11 Top 10 HighlightsCompilation album by Various Artis...

High air pressure area in the Atlantic Ocean The Azores High also known as North Atlantic (Subtropical) High/Anticyclone or the Bermuda-Azores High, is a large subtropical semi-permanent centre of high atmospheric pressure typically found south of the Azores in the Atlantic Ocean, at the Horse latitudes. It forms one pole of the North Atlantic oscillation, the other being the Icelandic Low. The system influences the weather and climatic patterns of vast areas of North Africa, Western Asia, So...

 

Lori LightfootLightfoot pada 2019Walikota Chicago ke-56PetahanaMulai menjabat 20 Mei 2019WakilTom TunneyPendahuluRahm Emanuel Informasi pribadiLahirLori Elaine Lightfoot4 Agustus 1962 (umur 61)Massillon, Ohio, Amerika SerikatPartai politikPartai DemokratSuami/istriAmy EshlemanAnak1PendidikanUniversitas Michigan (BA)Universitas Chicago (JD)Tanda tanganSunting kotak info • L • B Lori Elaine Lightfoot (lahir 4 Agustus 1962) adalah seorang jaksa dan politikus Amerika Serika...

 

French/American sculptor This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article may have been created or edited in return for undisclosed payments, a violation of Wikipedia's terms of use. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. (January 2020) This article may contain wording that promotes the subject through exaggerati...

Alva Myrdal di Sveaparken, Tierp (1968) Alva Myrdal (31 Januari 1902 – 1 Februari 1986) ialah seorang politikus dan diplomat Swedia. Terlahir sebagai Alva Reimer, ia menikah pada tahun 1934 dengan Gunnar Myrdal. Pada tahun 1962 ia masuk Parlemen Swedia, dan pada tahun 1962 ia dikirim sebagai delegasi Swedia pada konferensi mengenai gencatan senjata di Jenewa, dan kedudukan itu dipertahankannya hingga tahun 1973. Pada tahun 1966 ia diangkat sebagai menteri tanpa portofolio, dan...

 

Hospital in Province of Florence, ItalyMeyer Children's HospitalOspedale Pediatrico Meyer entranceGeographyLocationFlorence, Province of Florence, ItalyCoordinates43°48′20.49″N 11°14′51.77″E / 43.8056917°N 11.2477139°E / 43.8056917; 11.2477139OrganisationTypeSpecialistAffiliated universityUniversity of FlorenceServicesSpecialityPediatricsHistoryOpened1884LinksListsHospitals in ItalyOther linksTeaching hospitals The Meyer Children Hospital (Italian: Ospedale...

 

グルーチョ・マルクスGroucho Marx 1950年撮影本名 Julius Henry Marx生年月日 (1890-10-02) 1890年10月2日没年月日 (1977-08-19) 1977年8月19日(86歳没)出生地 アメリカ合衆国・ニューヨーク州・ニューヨーク死没地 アメリカ合衆国・カリフォルニア州・ロサンゼルス職業 俳優・コメディアン・作家ジャンル 映画・テレビ番組・舞台・ラジオ番組活動期間 1905年 - 1976年配偶者 ルース・ジ...

Alessandro BlasettiBlasetti in 1965Lahir(1900-07-03)3 Juli 1900Roma, ItaliaMeninggal1 Februari 1987(1987-02-01) (umur 86)Roma, ItaliaPekerjaanSutradaraTahun aktif1917–1981 Alessandro Blasetti (3 Juli 1900 – 1 Februari 1987) adalah seorang sutradara dan penulis naskah asal Italia yang mempengaruhi neorealisme Italia dengan film Quattro passi fra le nuvole. Blasetti adalah salah satu figur utama dalam sinema Italia pada era Fasis. Ia terkadang dikenal sebagai bapak ...

 

この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: ウォーカーズ〜迷子の大人たち – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL · dlib.jp · ジャパンサーチ · TWL(2024年1月) 土曜ドラマ・ウォーカー...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!