Share to: share facebook share twitter share wa share telegram print page

Streaming SIMD Extensions

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions (65 unique mnemonics[1] using 70 encodings), most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

Intel's first IA-32 SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing x87 floating-point registers making the CPUs unable to work on both floating-point and SIMD data at the same time, and it only worked on integers. SSE floating-point instructions operate on a new independent register set, the XMM registers, and adds a few integer instructions that work on MMX registers.

SSE was subsequently expanded by Intel to SSE2, SSE3, SSSE3 and SSE4. Because it supports floating-point math, it had wider applications than MMX and became more popular. The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations[when?] by using MMX in parallel with SSE operations.

SSE was originally called Katmai New Instructions (KNI), Katmai being the code name for the first Pentium III core revision. During the Katmai project Intel sought to distinguish it from their earlier product line, particularly their flagship Pentium II. It was later renamed Internet Streaming SIMD Extensions (ISSE[2]), then SSE.

AMD added a subset of SSE, 19 of them, called new MMX instructions,[3] and known as several variants and combinations of SSE and MMX, shortly after with the release of the original Athlon in August 1999, see 3DNow! extensions. AMD eventually added full support for SSE instructions, starting with its Athlon XP and Duron (Morgan core) processors.

Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The AMD64 extensions from AMD (originally called x86-64) added a further eight registers XMM8 through XMM15, and this extension is duplicated in the Intel 64 architecture. There is also a new 32-bit control/status register, MXCSR. The registers XMM8 through XMM15 are accessible only in 64-bit operating mode.

SSE used only a single data type for XMM registers:

SSE2 would later expand the usage of the XMM registers to include:

  • two 64-bit double-precision floating-point numbers or
  • two 64-bit integers or
  • four 32-bit integers or
  • eight 16-bit short integers or
  • sixteen 8-bit bytes or characters.

Because these 128-bit registers are additional machine states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions that can save all x86 and SSE register states at once. This support was quickly added to all major IA-32 operating systems.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the floating-point unit (FPU).[2] While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating-point operations to be mixed without the performance hit from explicit MMX/floating-point mode switching.

SSE instructions

SSE introduced both scalar and packed floating-point instructions.

Floating-point instructions

  • Memory-to-register/register-to-memory/register-to-register data movement
    • Scalar – MOVSS
    • Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS, MOVMSKPS
  • Arithmetic
    • Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
    • Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
  • Compare
    • Scalar – CMPSS, COMISS, UCOMISS
    • Packed – CMPPS
  • Data shuffle and unpacking
    • Packed – SHUFPS, UNPCKHPS, UNPCKLPS
  • Data-type conversion
    • Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
    • Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
  • Bitwise logical operations
    • Packed – ANDPS, ORPS, XORPS, ANDNPS

Integer instructions

  • Arithmetic
    • PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
  • Data movement
    • PEXTRW, PINSRW
  • Other
    • PMOVMSKB, PSHUFW

Other instructions

  • MXCSR management
    • LDMXCSR, STMXCSR
  • Cache and Memory management
    • MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE

Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions.

 vec_res.x = v1.x + v2.x;
 vec_res.y = v1.y + v2.y;
 vec_res.z = v1.z + v2.z;
 vec_res.w = v1.w + v2.w;

This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions.

 movaps xmm0, [v1] ;xmm0 = v1.w | v1.z | v1.y | v1.x 
 addps xmm0, [v2]  ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
 movaps [vec_res], xmm0  ;xmm0

Later versions

  • SSE2, Willamette New Instructions (WNI), introduced with the Pentium 4, is a major enhancement to SSE. SSE2 adds two major features: double-precision (64-bit) floating-point for all SSE operations, and MMX integer operations on 128-bit XMM registers. In the original SSE instruction set, conversion to and from integers placed the integer data in the 64-bit MMX registers. SSE2 enables the programmer to perform SIMD math on any data type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers. It offers an orthogonal set of instructions for dealing with common data types.
  • SSE3, also called Prescott New Instructions (PNI), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions. It also allowed addition or multiplication of two numbers that are stored in the same register, which wasn't possible in SSE2 and earlier. This capability, known as horizontal in Intel terminology, was the major addition to the SSE3 instruction set. AMD's 3DNow! extension could do the latter too.
  • SSSE3, Merom New Instructions (MNI), is an upgrade to SSE3, adding 16 new instructions which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.
  • SSE4, Penryn New Instructions (PNI), is another major enhancement, adding a dot product instruction, additional integer instructions, a popcnt instruction (Population count: count number of bits set to 1, used extensively e.g. in cryptography), and more.
  • XOP, FMA4 and CVT16 are new iterations announced by AMD in August 2007[4][5] and revised in May 2009.[6]
  • Advanced Vector Extensions (AVX), Gesher New Instructions (GNI), is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Intel released processors in early 2011 with AVX support.[7]
  • AVX2 is an expansion of the AVX instruction set.
  • AVX-512 (3.1 and 3.2) are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture.

Identifying

The following programs can be used to determine which, if any, versions of SSE are supported on a system

  • Intel Processor Identification Utility[8]
  • CPU-Z – CPU, motherboard, and memory identification utility.
  • lscpu - provided by the util-linux package in most Linux distributions.

References

  1. ^ "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture". Intel. April 2022. pp. 5-16–5-19. Archived from the original on April 25, 2022. Retrieved May 16, 2022.
  2. ^ a b Diefendorff, Keith (March 8, 1999). "Pentium III = Pentium II + SSE: Internet SSE Architecture Boosts Multimedia Performance" (PDF). Microprocessor Report. 13 (3). Archived (PDF) from the original on April 17, 2018. Retrieved September 1, 2017.
  3. ^ "AMD Extensions to the 3DNow and MMX Instruction Sets Manual" (PDF). Advanced Micro Devices, Inc. March 2000. Archived from the original (PDF) on May 17, 2008. Retrieved April 18, 2024.
  4. ^ Vance, Ashlee (August 3, 2007). "AMD plots single thread boost with x86 extensions". The Register. Archived from the original on April 27, 2011. Retrieved August 24, 2017.
  5. ^ "AMD64 Technology: 128-Bit SSE5 Instruction Set" (PDF). AMD. August 2007. Archived (PDF) from the original on August 25, 2017. Retrieved August 24, 2017.
  6. ^ "AMD64 Technology AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions" (PDF). AMD. November 2009. Archived (PDF) from the original on January 31, 2017. Retrieved August 24, 2017.
  7. ^ Girkar, Milind (October 1, 2013). "Intel® Advanced Vector Extensions (Intel® AVX)". Intel. Archived from the original on August 25, 2017. Retrieved August 24, 2017.
  8. ^ "Download the Intel® Processor Identification Utility". Intel. July 24, 2017. Archived from the original on August 25, 2017. Retrieved August 24, 2017.

Read other articles:

澳門旗幟列表是有關澳門現在或是過去曾經使用過的旗幟列表。 地區代表旗幟 旗幟 日期 用途 設計 敘述 1554年-1616年 葡萄牙王國國旗 藍底色中央加上國徽 曼努埃爾一世旗幟 1578年-1616年 葡萄牙王國国旗 塞巴斯蒂昂旗幟 1616年-1640年 葡萄牙王國國旗 費利佩二世旗幟 1640年-1667年 葡萄牙王國國旗 若昂四世旗幟 1667年-1707年 葡萄牙王國國旗 佩德羅二世旗幟 1707年-1816年1826

Трэвис Конекни Позиция центральный нападающий Рост 178 см Вес 79 кг Хват правый[d] Страна  Канада Дата рождения 11 марта 1997(1997-03-11) (26 лет) Место рождения Лондон, Мидлсекс[d], Онтарио, Канада Драфт НХЛ в 2015 году выбран клубом «Филадельфия Флайерз» в 1-м раунде под о...

Patung empat tetrarka Patung empat tetrarka adalah sekelompok pahatan porfiri dari empat kaisar Romawi yang berasal dari sekitar tahun 300 Masehi. Sekelompok pahatan tersebut telah ada di persimpangan bagian depan Basilika Santo Markus di Venesia, Italia sejak Abad Pertengahan. Pranala luar Wikimedia Commons memiliki media mengenai Statues of the Tetrarchs (Venice). Reconstruction of the Philadelphion, where the sculpture used to stand (no return link) Koordinat: 45°26′03″N 12°20′23...

Ningbo Metro station Datong Bridge大通桥General informationLocationJiangbei District, Ningbo, ZhejiangChinaOperated byNingbo Rail Transit Co. Ltd.Line(s)     Line 2     Line 3Platforms4 (2 island platform)ConstructionStructure typeUndergroundHistoryOpened26 September 2015Services Preceding station Ningbo Rail Transit Following station Yasaiyantowards Lishe International Airport Line 2 Kongputowards Honglian Terminus Line 3 Zhongxing Bridge ...

kwas γ-aminomasłowy (GABA) Receptory GABA – rodzaje receptorów błonowych wiążących kwas γ-aminomasłowy (GABA), pełniących ważną rolę w funkcjonowaniu układu nerwowego. Receptory te dzielą się na trzy klasy: GABAA (receptor jonotropowy), GABAB (powiązany z białkiem G receptor metabotropowy) i zidentyfikowany później GABAC (receptor jonotropowy)[1]. Receptor GABAA Receptor ten stanowi kanał chlorkowy zbudowany z pięciu podjednostek białkowych. Jego aktywność reg...

Foram assinalados vários problemas nesta página ou se(c)ção: As fontes não são citadas no corpo do artigo, o que compromente a verificabilidade. Texto necessita de revisão, devido a inconsistências e/ou dados de confiabilidade duvidosa. Baraminologia é um sistema de taxonomia criacionista que classifica as espécies em grupos chamados baramins considerando o relato de Gênesis e outras partes da Bíblia. Seus proponentes defendem uma evolução com limitações entre os tipos, quer s...

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Saut Situmorang – berita · surat kabar · buku · cendekiawan · JSTOR Ini adalah nama Batak Toba, marganya adalah Situmorang.Untuk mantan pimpinan Komisi Pemberantasan Korupsi (KPK), lihat Thony Saut Situm...

Mio Takeuchi (竹内 実生code: ja is deprecated , Takeuchi Mio, lahir 8 Maret 1985) adalah aktris asal Jepang. Ia dikenal dengan peran-perannya dalam serial tokusatsu dan drama: sebagai Sae Taiga / GaoWhite dalam serial Super Sentai Hyakujuu Sentai Gaoranger. Takeuchi juga mempunyai karier menyanyi J-pop (musik pop Jepang) namun tidak begitu aktif. Filmografi Drama Hyakujuu Sentai Gaoranger (TV Asahi, 2001) - Sae Taiga / GaoWhite P na Kanojo (serial ke-3) (TBS, 2002) R# (Room Number) (TV As...

Artikel ini membutuhkan rujukan tambahan agar kualitasnya dapat dipastikan. Mohon bantu kami mengembangkan artikel ini dengan cara menambahkan rujukan ke sumber tepercaya. Pernyataan tak bersumber bisa saja dipertentangkan dan dihapus.Cari sumber: Dua puluh sifat Allah – berita · surat kabar · buku · cendekiawan · JSTOR (Februari 2023) Bagian dari seri IslamAllah, Tuhan dalam IslamLafal jalalah Allahdalam kaligrafi Arab Daftar Allah Akidah Asmaulhusna ...

يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (ديسمبر 2023) غزوان الزركلي معلومات شخصية تاريخ الميلاد 4 يناير 1954 (العمر 69 سنة) الحياة العملية المهنة عازف بيانو  تع...

البيئة تأثير الإنسان (على المناخ) قضايا حماية البيئة دراسات بيئية البيئة في الاستشارة التربية الهندسة الإنسانيات القانون السياسة العلوم العلوم الاجتماعية قائمة مقالات البيئة قوائم المواضيع البيئية البوابة تصنيف كومنزعنت الإنسانيات البيئية من مجالات البحث متعدد التخصص...

Trajectory of 2004 FH in the Earth–Moon system Goldstone radar images of asteroid 2007 PA8's Earth flyby in 2012 This is a list of examples where an asteroid or meteoroid travels close to the Earth. Some are regarded as potentially hazardous objects if they are estimated to be large enough to cause regional devastation. Near-Earth object detection technology began to improve around 1998, so objects being detected as of 2004 could have been missed only a decade earlier due to a lack of dedic...

American baseball player Baseball player Mickey LolichLolich in 2009PitcherBorn: (1940-09-12) September 12, 1940 (age 83)Portland, Oregon, U.S.Batted: SwitchThrew: LeftMLB debutMay 12, 1963, for the Detroit TigersLast MLB appearanceSeptember 23, 1979, for the San Diego PadresMLB statisticsWin–loss record217–191Earned run average3.44Strikeouts2,832 Teams Detroit Tigers (1963–1975) New York Mets (1976) San Diego Padres (1978–1979) Career highlights and...

American rock band Marco Peña redirects here. For the Argentine politician, see Marcos Peña. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Ataris – news · newspapers · books · scholar · JSTOR (February 2013) (Learn how and when to remove this template message) The AtarisThe Ataris in performing 2012....

Hospital in Carlshof, East PrussiaCarlshof InstitutionsCarlshöfer AnstaltenCarlshöfer Anstalten about 1914Location on a map of interwar GermanyShow map of GermanyCarlshof Institutions (Poland)Show map of PolandGeographyLocationCarlshof, East Prussia (Karolewo, Poland)Coordinates54°02′33″N 21°15′06″E / 54.0426°N 21.2516°E / 54.0426; 21.2516ServicesBeds36 (1882) 554 (1898)1500 (1914)799 (1928) 900 (1939)HistoryOpenedOctober 1882 (1882-10) (1883)Cl...

Cancelled video game Video gameX-COM: Alliance1999 logoDeveloper(s) MicroProse Chipping Sodbury (1995–1999) MicroProse/Infogrames Hunt Valley (1999–2002) Publisher(s)MicroProse (1995-1998) Hasbro Interactive (1998-2001) Infogrames (2001-2002)Producer(s) John Broomhall, Grant Dean, Stuart Whyte (UK) Martin DeRiso (Chapel Hill) Designer(s) Andrew Williams, Terry Greer, Marc Curtis (UK) Chris Clark (Chapel Hill) SeriesX-COMEngineUnreal EnginePlatform(s)Microsoft WindowsReleaseUnreleasedGenre...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Januari 2023. Rumah Kediaman D. I. Panjaitan adalah salah satu rumah yang terletak di Kota Adminstrasi Jakarta Selatan, Provinsi Daeeah Khusus Ibukota Jakarta. Pemerintah Indonesia telah menetapkan rumah ini sebagai salah satu bangunan cagar budaya Indonesia. Peneta...

Rock club in Helsinki, Finland Tavastia ClubEntrance on Urho Kekkosen katuAddressUrho Kekkosen katu 4–6LocationHelsinki, FinlandCoordinates60°10′06.9″N 024°55′58.5″E / 60.168583°N 24.932917°E / 60.168583; 24.932917OwnerHelsingin Rock & Roll Oy[1]Capacity700ConstructionBuilt1931Opened1970Websitewww.tavastiaklubi.fi Panorama of Tavastia Club 2015. The Tavastia Club (Finnish: Tavastia-klubi) is a popular rock music club in Helsinki, Finland. The ...

Affin Bank Berhad[1]TypePublic limited companyTraded asMYX: 5185ISINMYL5185OO003IndustryBankFounded1975Headquarters17th Floor, Menara Affin,80, Jalan Raja Chulan,50200 Kuala Lumpur, MalaysiaNumber of locations115 branches[2]Key peopleDato' Agil Natt,Chairman[3]Datuk Wan Razly Abdullah Wan Ali,President & Group CEO[4][5]ProductsFinancial servicesParentArmed Forces Fund Board (35.33%)[6]Boustead Holdings (20.73%)[7]SubsidiariesAffin Is...

Political party in Australia Queensland National Party National Party of QueenslandAbbreviationNPA-QLeaderFull listFounded1915; 108 years ago (1915)Dissolved26 July 2008; 15 years ago (26 July 2008)Merger ofNationalNorthern CountryMerged intoLiberal National[a]Headquarters37 Merivale Street, South Brisbane, QueenslandYouth wingYoung NationalsMembership (1989)50,000[2][b]Ideology Conservatism Agrarianism[4][5]...

Kembali kehalaman sebelumnya