Loop optimization

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

Representation of computation and transformations

Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.[1]

Optimization via a sequence of loop transformations

Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in Compiler transformations for high-performance computing[2]) to the source code or intermediate representation, with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.

Common loop transformations include:

  • Fission or distribution – loop fission attempts to break a loop into multiple loops over the same index range, but each new loop takes only part of the original loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body.
  • Fusion or combining – this combines the bodies of two adjacent loops that would iterate the same number of times (whether or not that number is known at compile time), as long as they make no reference to each other's data.
  • Interchange or permutation – these optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout.
  • Inversion – this technique changes a standard while loop into a do/while (a.k.a. repeat/until ) loop wrapped in an if conditional, reducing the number of jumps by two for cases where the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the initial if-guard can be skipped.
  • Loop-invariant code motion – this can vastly improve efficiency by moving a computation from inside the loop to outside of it, computing a value just once before the loop begins, if the resultant quantity of the calculation will be the same for every loop iteration (i.e., a loop-invariant quantity). This is particularly important with address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with inversion, because not all code is safe to be moved outside the loop.
  • Parallelization – this is a special case of automatic parallelization focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers (automatic parallelization) or manually (inserting parallel directives like OpenMP).
  • Reversal – a subtle optimization that reverses the order in which values are assigned to the index variable. This can help eliminate dependencies and thus enable other optimizations. Certain architectures utilize looping constructs at assembly level that count in a single direction only (e.g., decrement-jump-if-not-zero [DJNZ][3]).
  • Scheduling – this divides a loop into multiple parts that may be run concurrently on multiple processors.
  • Skewing – this technique is applied to a nested loop iterating over a multidimensional array, where each iteration of the inner loop depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop.
  • Software pipelining – a type of out-of-order execution of loop iterations to hide the latencies of processor function units.
  • Splitting or peeling – this attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different portions of the index range. A special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
  • Tiling or blocking – reorganizes a loop to iterate over blocks of data sized to fit in the cache.
  • Vectorization – attempts to run as many of the loop iterations as possible at the same time on a SIMD system.
  • Unrolling – duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches and increased program load time), but requires that the number of iterations be known at compile time (except in the case of Just-in-time compilation). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop.
  • Unswitching – moves a conditional from inside a loop to outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.
  • Sectioning or strip-mining – introduced for vector processors, loop-sectioning is a loop-transformation technique for enabling SIMD (single instruction, multiple data)-encodings of loops and improving memory performance. This involves each vector operation being done for a size less-than or equal-to the maximum vector length on a given vector machine.[4][5]

The unimodular transformation framework

The unimodular transformation approach[6] uses a single unimodular matrix to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within n loops as a set of integer points in an n-dimensional space, with the points being executed in lexicographical order. For example, the executions of a statement nested inside an outer loop with index i and an inner loop with index j can be associated with the pairs of integers . The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix .

A unimodular transformation is legal if it preserves the temporal sequence of all dependencies; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.

The polyhedral or constraint-based framework

The polyhedral model[7] handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. Affine transformations are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a constraint-based approach to loop optimization. For example, a single statement within an outer loop 'for i := 0 to n' and an inner loop 'for j := 0 to i+2' is executed once for each (i, j) pair such that 0 <= i <= n and 0 <= j <= i+2.

Once again, a transformation is legal if it preserves the temporal sequence of all dependencies. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).

See also

References

  1. ^ In the book Reasoning About Program Transformations, Jean-Francois Collard discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.
  2. ^ David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at CiteSeer [1]). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations
  3. ^ "8051 Instruction Set". www.win.tue.nl. Retrieved 2019-12-09.
  4. ^ "Intel Developer Zone".
  5. ^ "7.6.3.1 Strip-Mining (Sun Studio 12: Fortran Programming Guide)".
  6. ^ Steven S. Muchnick, Advanced Compiler Design and Implementation, 1997 Morgan Kaufmann. Section 20.4.2 discusses loop optimization.
  7. ^ R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.

Read other articles:

1833 Maine gubernatorial election ← 1832 September 9, 1833 1834 →   Nominee Robert P. Dunlap Daniel Goodenow Samuel E. Smith Party Democratic National Republican Independent Democrat Popular vote 25,731 18,112 3,024 Percentage 52.14% 36.70% 6.13% Governor before election Samuel E. Smith Democratic Elected Governor Robert P. Dunlap Democratic The 1833 Maine gubernatorial election took place on September 9, 1833. Incumbent Democratic Governor Samuel E. Smith was ...

 

Pour les articles homonymes, voir Zlatibor. Cet article est une ébauche concernant la Serbie et la géographie. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Zlatibor Administration Pays Serbie Villesou municipalités Bajina BaštaKosjerićUžicePožegaČajetinaAriljeNova VarošPrijepoljeSjenicaPriboj Démographie Population 284 929 hab. (2011) Densité 46 hab./km2 Géographie Coordonnées 43°&...

 

الهيئة العامة لمدينة الأبحاث العلمية والتطبيقات التكنولوجية مدينة الأبحاث العلمية والتطبيقات التكنولوجية (مصر)الشعار منظر عام للمدينة البلد  مصر المقر الرئيسي برج العرب الجديدة، محافظة الإسكندرية تاريخ التأسيس 2000 (منذ 23 سنة) المالك وزارة التعليم العالي والبحث العلمي ا

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Party's Over and Other Great Willie Nelson Songs – news · newspapers · books · scholar · JSTOR (September 2017) (Learn how and when to remove this template message) 1967 studio album by Willie NelsonThe Party's Over and Other Great Willie Nelson SongsSt...

 

Losilla (Fonfría) asentamiento medieval abandonadoPaís  España• Com. autónoma  Aragón• Provincia  TeruelPoblación 0 hab.[editar datos en Wikidata] Losilla o La Silla es un despoblado medieval de la Comunidad de Daroca situado entre Fonfría y Bea y entre la vertiente Suroeste de la Sierra de Orich y la vertiente nordeste de la Sierra de Pelarda. Se encuentra en la provincia de Teruel. Historia La primera referencia a este pueblo es el ...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (ديسمبر 2022) نيليس بيمنتل   معلومات شخصية الاسم عند الميلاد Nellys Rocio Pimentel Campusano الميلاد 27 أبريل 1997 (العمر 26 سنة)سان خوان، بورتوريكو الجنسية الولايات المتحدة  الطول 180&#...

Hau Lung-pin郝龍斌Wali kota TaipeiMasa jabatan26 Desember 2006 – 25 Desember 2014PendahuluMa Ying-JeouPenggantiKo Wen-je Informasi pribadiLahir22 Agustus 1952 (umur 71)Taipei, TaiwanPartai politikKuomintangSunting kotak info • L • B Hau Lung-pin (Hanzi: 郝龍斌, pinyin: Hao Longbin), lahir 22 Agustus 1952 adalah wali kota Taipei yang menjabat sekarang. Ia menggantikan Ma Ying-jeou pada tanggal 26 Desember 2006. Hau merupakan putra dari petinggi Kuomintang Ha...

 

YaxhaPemandangan udara Akropolis UtaraLokasi di GuatemalaLokasiFloresWilayahDepartemen Petén, GuatemalaKoordinat17°4′39″N 89°24′9″W / 17.07750°N 89.40250°W / 17.07750; -89.40250SejarahDidirikanPra-Klasik PertengahanDitinggalkanKlasik TerminalPeriodePeriode Pra-Klasik dan KlasikBudayaMayaCatatan situsTanggal ditemukan1970an dan seterusnyaArkeologBernard Hermes Proyecto Nacional TikalResponsible body: IDAEH Yaxha (atau Yaxhá dalam ortografi Spanyo...

 

This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: Punjabi drama – news · newspapers · books · scholar · JSTOR (May 2007) Punjabi Dramas are semi-improvised comedy stage plays popular in the Punjab region of Pakistan. The plays are scripted but actors and performers are given enough freedom to add hum...

بارغوزين تقسيم إداري  البلد روسيا  تعديل مصدري - تعديل   يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (فبراير 2016) بارغوزين (بالروسية: Баргузин) هي مدينة في ...

 

Spanish basketball player In this Spanish name, the first or paternal surname is Vives and the second or maternal family name is Torrent. Guillem VivesNo. 16 – Joventut BadalonaPositionPoint guardLeagueLiga ACBPersonal informationBorn (1993-06-16) 16 June 1993 (age 30)Barcelona, SpainNationalitySpanishListed height192 cm (6 ft 4 in)Listed weight190 lb (86 kg)Career informationNBA draft2015: undraftedPlaying career2011–presentCareer history2011...

 

Taiwanese TV series or program Lucky DaysPromo posterAlso known as第二回合我愛你第二回合我愛你GenreDramaCreated bySanlih E-TelevisionWritten byMag Hsu Sharon MaoChou Ping-chiChen Chiu-ju Hsieh Ping-hsunDirected byWang Ming-taiHsu Fu-chunStarringChris Wang Tammy ChenOpening themeRed Flower (紅花樂團) 第二回合我愛你 (Lucky Days/ Ti Erh Hui Ho Wo Ai Ni)Ending themeIris Lin (林綾) 在我身邊 (By My Side/ Tsai Wo Shen Pien)Country of originTaiwanOriginal languages...

15th-century Moroccan Sufi saint-scholar al-Jazuli redirects here. For other uses, see al-Jazuli (disambiguation). ImamMuhammad ibn Sulayman al-Jazuli al-SimlaliCopy of Dala'il al-Khayrat at the Chester Beatty LibraryTitleImam, SheikhPersonalBornc. 1404Sous, MoroccoDied1465 (aged 60–61)Sidi Chiker, MoroccoResting placeMarrakeshReligionIslamNationalityMoroccoEra15th centuryDenominationSunniJurisprudenceMalikiMain interest(s)SufismNotable work(s)Dala'il al-KhayratTariqaShadhili...

 

1931 film The Virtuous HusbandTheatrical release posterDirected byVin MooreScreenplay byJerry HorwinEdward LudwigFred Niblo, Jr.Dale Van EveryProduced byCarl Laemmle, Jr.StarringBetty CompsonElliott NugentJean ArthurJ. C. NugentAlison SkipworthTully MarshallCinematographyJerome AshEdited byArthur HiltonHarry W. LiebProductioncompanyUniversal PicturesDistributed byUniversal PicturesRelease date April 12, 1931 (1931-04-12) Running time75 minutesCountryUnited StatesLanguageEnglish...

 

Untuk kegunaan lain, lihat Kilas Balik. Kilas Balik karya La LunaDirilis10 Oktober 2008Direkam25 Mei 2008GenrePopDurasi50 MenitLabelBulletin MusikProduserWirjonoKronologi La Luna Indah Pada Waktunya(2005)Indah Pada Waktunya2005 Kilas Balik (2008) Berbunga Bunga(2011)Berbunga Bunga2011 Kilas Balik merupakan sebuah album musik karya La Luna. Dirilis pada tahun 2008 yang merupakan album kumpulan lagu terbaik dari La Luna. Daftar lagu Tetaplah mencintaiku Membekas dihati Merelakanmu Penggalan...

Level 9 English Rugby Union league Midlands 4 West (North)Current season or competition: 2019–20 Midlands 4 West (North)SportRugby unionInstituted2005; 18 years ago (2005) (as Midlands 5 West (North))Number of teams12Country EnglandHoldersClee Hill (1st title) (2019–20)(promoted to Midlands 3 West (North))Most titlesEccleshall, Harborne (2 titles)WebsiteEngland RFU Midlands 4 West (North) is a level 9 English Rugby Union league and level 4 of the Midlands League, ma...

 

Card in Tarot decks The Moon (XVIII) from the Rider–Waite tarot deck The Moon (XVIII) is the eighteenth trump or Major Arcana card in most traditional tarot decks. It is used in game playing as well as in divination. An original card from the tarot deck of Jean Dodal of Lyon, a classic Tarot of Marseilles deck. The deck dates from 1701–1715. Description The card depicts a night scene, where two large pillars are shown. A wolf and a domesticated dog howl at the Moon while a crayfish emerge...

 

Historic house in Indiana, United States United States historic placeThomas R. Marshall HouseU.S. National Register of Historic Places Thomas R. Marshall House, May 2012Show map of IndianaShow map of the United StatesLocation108 W. Jefferson St., Columbia City, IndianaCoordinates41°9′33″N 85°29′20″W / 41.15917°N 85.48889°W / 41.15917; -85.48889Arealess than one acreBuilt1874 (1874)NRHP reference No.83000046[1]Added to NRHPJuly 21, 198...

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) A major contributor to this article appears to have a close connection with its subject. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. Please discuss further on the talk page. (July 2018) (Learn how and when to remove this template message) The topic of this article may not meet Wikip...

 

Horse race Kyoto Himba StakesGrade 3 raceDonau Blue wins the 2012 Kyoto Himba StakesLocationKyoto RacecourseInaugurated1986Race typeThoroughbred Flat racingRace informationDistance1400 metresSurfaceTurfTrackRight-handedQualification4-y-o+ fillies and maresWeight55 kgPurse¥77,360,000 (total), ¥36,000,000 (winner) The Kyoto Himba Stakes (Japanese 京都牝馬ステークス) is a Grade 3 horse race for Thoroughbred fillies and mares aged four and over, run in February over a distance of 1400 ...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!