Local regression

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Local regression or local polynomial regression,[1] also known as moving regression,[2] is a generalization of the moving average and polynomial regression.[3] Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing), both pronounced /ˈlɛs/ LOH-ess. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter[4][5] (proposed 15 years before LOESS).

LOESS and LOWESS thus build on "classical" methods, such as linear and nonlinear least squares regression. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.

The trade-off for these features is increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed. Most other modern methods for process modeling are similar to LOESS in this respect. These methods have been consciously designed to use our current computational ability to the fullest possible advantage to achieve goals not easily achieved by traditional approaches.

A smooth curve through a set of data points obtained with this statistical technique is called a loess curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the y-axis scattergram criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a lowess curve; however, some authorities treat lowess and loess as synonyms.[6][7]

Model definition

In 1964, Savitsky and Golay proposed a method equivalent to LOESS, which is commonly referred to as Savitzky–Golay filter. William S. Cleveland rediscovered the method in 1979 and gave it a distinct name. The method was further developed by Cleveland and Susan J. Devlin (1988). LOWESS is also known as locally weighted polynomial regression.

At each point in the range of the data set a low-degree polynomial is fitted to a subset of the data, with explanatory variable values near the point whose response is being estimated. The polynomial is fitted using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away. The value of the regression function for the point is then obtained by evaluating the local polynomial using the explanatory variable values for that data point. The LOESS fit is complete after regression function values have been computed for each of the data points. Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible. The range of choices for each part of the method and typical defaults are briefly discussed next.

Localized subsets of data

The subsets of data used for each weighted least squares fit in LOESS are determined by a nearest neighbors algorithm. A user-specified input to the procedure called the "bandwidth" or "smoothing parameter" determines how much of the data is used to fit each local polynomial. The smoothing parameter, , is the fraction of the total number n of data points that are used in each local fit. The subset of data used in each weighted least squares fit thus comprises the points (rounded to the next largest integer) whose explanatory variables' values are closest to the point at which the response is being estimated.[7]

Since a polynomial of degree k requires at least k + 1 points for a fit, the smoothing parameter must be between and 1, with denoting the degree of the local polynomial.

is called the smoothing parameter because it controls the flexibility of the LOESS regression function. Large values of produce the smoothest functions that wiggle the least in response to fluctuations in the data. The smaller is, the closer the regression function will conform to the data. Using too small a value of the smoothing parameter is not desirable, however, since the regression function will eventually start to capture the random error in the data.

Degree of local polynomials

The local polynomials fit to each subset of the data are almost always of first or second degree; that is, either locally linear (in the straight line sense) or locally quadratic. Using a zero degree polynomial turns LOESS into a weighted moving average. Higher-degree polynomials would work in theory, but yield models that are not really in the spirit of LOESS. LOESS is based on the ideas that any function can be well approximated in a small neighborhood by a low-order polynomial and that simple models can be fit to data easily. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.

Weight function

As mentioned above, the weight function gives the most weight to the data points nearest the point of estimation and the least weight to the data points that are furthest away. The use of the weights is based on the idea that points near each other in the explanatory variable space are more likely to be related to each other in a simple way than points that are further apart. Following this logic, points that are likely to follow the local model best influence the local model parameter estimates the most. Points that are less likely to actually conform to the local model have less influence on the local model parameter estimates.

The traditional weight function used for LOESS is the tri-cube weight function,

where d is the distance of a given data point from the point on the curve being fitted, scaled to lie in the range from 0 to 1.[7]

However, any other weight function that satisfies the properties listed in Cleveland (1979) could also be used. The weight for a specific point in any localized subset of data is obtained by evaluating the weight function at the distance between that point and the point of estimation, after scaling the distance so that the maximum absolute distance over all of the points in the subset of data is exactly one.

Consider the following generalisation of the linear regression model with a metric on the target space that depends on two parameters, . Assume that the linear hypothesis is based on input parameters and that, as customary in these cases, we embed the input space into as , and consider the following loss function

Here, is an real matrix of coefficients, and the subscript i enumerates input and output vectors from a training set. Since is a metric, it is a symmetric, positive-definite matrix and, as such, there is another symmetric matrix such that . The above loss function can be rearranged into a trace by observing that

.

By arranging the vectors and into the columns of a matrix and an matrix respectively, the above loss function can then be written as

where is the square diagonal matrix whose entries are the s. Differentiating with respect to and setting the result equal to 0 one finds the extremal matrix equation

.

Assuming further that the square matrix is non-singular, the loss function attains its minimum at

.

A typical choice for is the Gaussian weight

.

Advantages

As discussed above, the biggest advantage LOESS has over many other methods is the process of fitting a model to the sample data does not begin with the specification of a function. Instead the analyst only has to provide a smoothing parameter value and the degree of the local polynomial. In addition, LOESS is very flexible, making it ideal for modeling complex processes for which no theoretical models exist. These two advantages, combined with the simplicity of the method, make LOESS one of the most attractive of the modern regression methods for applications that fit the general framework of least squares regression but which have a complex deterministic structure.

Although it is less obvious than for some of the other methods related to linear least squares regression, LOESS also accrues most of the benefits typically shared by those procedures. The most important of those is the theory for computing uncertainties for prediction and calibration. Many other tests and procedures used for validation of least squares models can also be extended to LOESS models [citation needed].

Disadvantages

LOESS makes less efficient use of data than other least squares methods. It requires fairly large, densely sampled data sets in order to produce good models. This is because LOESS relies on the local data structure when performing the local fitting. Thus, LOESS provides less complex data analysis in exchange for greater experimental costs.[7]

Another disadvantage of LOESS is the fact that it does not produce a regression function that is easily represented by a mathematical formula. This can make it difficult to transfer the results of an analysis to other people. In order to transfer the regression function to another person, they would need the data set and software for LOESS calculations. In nonlinear regression, on the other hand, it is only necessary to write down a functional form in order to provide estimates of the unknown parameters and the estimated uncertainty. Depending on the application, this could be either a major or a minor drawback to using LOESS. In particular, the simple form of LOESS can not be used for mechanistic modelling where fitted parameters specify particular physical properties of a system.

Finally, as discussed above, LOESS is a computationally intensive method (with the exception of evenly spaced data, where the regression can then be phrased as a non-causal finite impulse response filter). LOESS is also prone to the effects of outliers in the data set, like other least squares methods. There is an iterative, robust version of LOESS [Cleveland (1979)] that can be used to reduce LOESS' sensitivity to outliers, but too many extreme outliers can still overcome even the robust method.

See also

References

Citations

  1. ^ Fox & Weisberg 2018, Appendix.
  2. ^ Harrell 2015, p. 29.
  3. ^ Garimella 2017.
  4. ^ "Savitzky–Golay filtering – MATLAB sgolayfilt". Mathworks.com.
  5. ^ "scipy.signal.savgol_filter — SciPy v0.16.1 Reference Guide". Docs.scipy.org.
  6. ^ Kristen Pavlik, US Environmental Protection Agency, Loess (or Lowess), Nutrient Steps, July 2016.
  7. ^ a b c d NIST, "LOESS (aka LOWESS)", section 4.1.4.4, NIST/SEMATECH e-Handbook of Statistical Methods, (accessed 14 April 2017)

Sources

Public Domain This article incorporates public domain material from the National Institute of Standards and Technology

Read other articles:

1999 studio album by Cecilia BartoliThe Vivaldi AlbumStudio album by Cecilia BartoliReleased1999LabelDecca Classics The Vivaldi Album is a 1999 recording of Vivaldi opera arias by Cecilia Bartoli with the period instrument ensemble Il Giardino Armonico released by Decca Classics. The album's commercial success marked a major leap in revival of interest in the operas of Antonio Vivaldi.[1][2] References ^ Billboard - 6 oct. 2001 - Page 94 Vol. 113, n° 40 with The Vival...

 

المئذنة أو المنارة أو الصومعة، هي مبنى أو برج مرتفع طويل يكون ملحقا بالمساجد وغرضه إيصال صوت الأذان للمسلمين ودعوتهم للصلاة.[1][2][3] قائمة مآذن جمهورية إيران تحتوي هذه القائمة على أسماء مآذن المنتشرة في أنحاء جمهورية إيران. اسم موقع صورة ملاحظات لمئذنة والجامع نغ

 

For the French Canadian series, see BeTipul § Canada. French TV series or program In TherapyFrenchEn thérapie Created byÉric ToledanoOlivier NakacheBased onBeTipulby Hagai LeviOri SivanNir BergmanStarring Frédéric Pierrot Mélanie Thierry Reda Kateb Céleste Brunnquell Clémence Poésy Pio Marmaï Carole Bouquet Eye Haïdara Aliocha Delmotte Suzanne Lindon Jacques Weber Charlotte Gainsbourg Country of originFranceOriginal languageFrenchNo. of series2No. of episodes70ProductionR...

Islam menurut negara Afrika Aljazair Angola Benin Botswana Burkina Faso Burundi Kamerun Tanjung Verde Republik Afrika Tengah Chad Komoro Republik Demokratik Kongo Republik Kongo Djibouti Mesir Guinea Khatulistiwa Eritrea Eswatini Etiopia Gabon Gambia Ghana Guinea Guinea-Bissau Pantai Gading Kenya Lesotho Liberia Libya Madagaskar Malawi Mali Mauritania Mauritius Maroko Mozambik Namibia Niger Nigeria Rwanda Sao Tome dan Principe Senegal Seychelles Sierra Leone Somalia Somaliland Afrika Selatan ...

 

Moscow Metro line This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (August 2022) (Learn how and when to remove this template message) Arbatsko-Pokrovskaya lineKiyevskaya stationOverviewOwnerMoskovsky MetropolitenLocaleMoscowTerminiPyatnitskoye Shosse (west)Shchyolkovskaya (east)Stations22ServiceTypeRapid transitSystemMoscow MetroOperator(s)Moskovsky Metropolite...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (سبتمبر 2022) آخر الآشوريينThe Last Assyrians (بالإنجليزية) معلومات عامةالصنف الفني فيلم وثائقي تاريخ الانتاج2004 (2004)تاريخ الصدور 2004 مدة العرض 53 minutesاللغة الأصلية اللغة الإنجل...

Historic site in Queensland, AustraliaCroydon Police StationCroydon Police StationLocationSamwell Street, Croydon, Shire of Croydon, Queensland, AustraliaCoordinates18°12′12″S 142°14′39″E / 18.2032°S 142.2443°E / -18.2032; 142.2443Design period1870s - 1890s (late 19th century)Built1899Architectural style(s)Classicism Queensland Heritage RegisterOfficial namePolice Reserve Complex (former), Former Police Station and ResidenceTypestate heritage (built)Designa...

 

Swimming at the1968 Summer OlympicsFreestyle100 mmenwomen200 mmenwomen400 mmenwomen800 mwomen1500 mmenBackstroke100 mmenwomen200 mmenwomenBreaststroke100 mmenwomen200 mmenwomenButterfly100 mmenwomen200 mmenwomenIndividual medley200 mmenwomen400 mmenwomenFreestyle relay4×100 mmenwomen4×200 mmenMedley relay4×100 mmenwomenvte The men's 4×100 metre medley relay event at the 1968 Olympic Games took place on October 26.[1] This swimming event uses medley swimming as a relay....

 

Russian volleyball club Lokomotiv NovosibirskFull nameVC Lokomotiv NovosibirskFounded1977GroundSKK Sever(Capacity: 3,000)ChairmanVadim GoncharovManagerPlamen KonstantinovCaptainNikolay PavlovLeagueSuper League2021/222nd placeWebsiteClub home pageUniforms Home Away Lokomotiv Novosibirsk (Russian: ВК «Локомотив» Новосибирск) is a Russian professional volleyball club, based in Novosibirsk, playing in Russian Volleyball Super League. Achievements CEV Champions League (x1) ...

Entertainment investment firm Saban Entertainment Group redirects here. For the former television production company that could be considered a predecessor, see Saban Entertainment. Saban Capital Group LLCTypePrivateIndustryPrivate equityEntertainmentMediaMerchandisingCommunicationsPredecessorSaban EntertainmentFoundedJuly 11, 2010; 13 years ago (2010-07-11)FounderHaim SabanHeadquartersLos Angeles, California, U.S.Area servedWorldwideKey peopleHaim Saban (Chairman, CEO)Adam ...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (مارس 2019) حزقيال أندريس راميريز معلومات شخصية الميلاد 11 أكتوبر 1947 (76 سنة)  مونتفيدو  مواطنة الأوروغواي  الحياة العملية المدرسة الأم جامعة الجمهورية  المهنة س...

 

Countermovement to LGBTQ+ pride movements and events This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (August 2021) A straight pride sticker on the window of a pick-up truck in Sonoma, California in 2023. Straight pride is a reactionary slogan that arose in the 1980s and early 1990s and has primarily been used by social conservatives as a political ...

2003 studio album by Legendary Shack ShakersCockadoodledon'tStudio album by Legendary Shack ShakersReleased2003Recorded2002-03, The Wagon, East Nashville, TennesseeGenre Psychobilly[1] alternative country[2] LabelBloodshotProducerJ.D. Wilkes Mark RobertsonLegendary Shack Shakers chronology Hunkerdown(1998) Cockadoodledon't(2003) Believe(2004) Cockadoodledon't is the third studio album by American rock band Legendary Shack Shakers. Released on April 22, 2003, the album ...

 

Painted PeoplePoster teatrikalSutradara Clarence G. Badger Produser Associated First National Ditulis oleh Edward J. Montagne BerdasarkanThe Swamp olehRichard ConnellPemeranColleen MooreSinematograferRudolph BergquistPenyuntingGeorge McQuireDistributorAssociated First NationalTanggal rilis 28 Januari 1924 (1924-01-28) Durasi7 rolNegara Amerika Serikat BahasaFilm bisu dengan antar judul Inggris Painted People adalah sebuah film komedi drama bisu Amerika Serikat tahun 1924 garapan Cla...

 

Стиль этой статьи неэнциклопедичен или нарушает нормы литературного русского языка. Статью следует исправить согласно стилистическим правилам Википедии.Михаил Михайлович Пашинин Дата рождения 30 ноября 1902(1902-11-30) Место рождения Тула, Российская империя Дата смерти 28 ма...

British TV series or programme Probation OfficerGenreDramaWritten byJulian Bond Peter Yeldham Phillip Grenville MannDirected byPeter Sasdy Christopher Morahan Royston Morley Josephine DouglasStarringJohn Paul Jessica Spencer David Davies John ScottCountry of originUnited KingdomOriginal languageEnglishNo. of series4No. of episodes109ProductionProducersAntony Kearey Rex FirkinRunning time60 minutesProduction companyAssociated TelevisionOriginal releaseNetworkITVRelease14 September 1959...

 

2018 album by Angelo Badalamenti and David Lynch For the novel by Tibor Fischer, see The Thought Gang. Thought GangStudio album by Angelo Badalamenti and David LynchReleasedNovember 2, 2018Recorded1991, May 1992–May 1993StudioCherokee Studios in Los AngelesGenreFree jazzexperimentalspoken wordLength59:15LabelSacred BonesProducerAngelo BadalamentiDavid Lynch Thought Gang is a studio album by American composers Angelo Badalamenti and David Lynch created under the joint moniker Thought Gan...

 

Newspaper in Castlebar, Ireland The Connaught TelegraphTypeWeekly newspaperFormatCompactOwner(s)Celtic Media GroupEditorTom KellyFounded1828Political alignmentCentralHeadquartersCastlebarWebsiteConnaught Telegraph website The Connaught Telegraph is a weekly local newspaper published in Castlebar, County Mayo in Ireland. The paper is in compact format (six columns), and published every Tuesday. History Frederick Cavendish founded The Connaught Telegraph or Mayo Telegraph as it was originally n...

Japanese manga series HyperinflationFirst tankōbon volume coverハイパーインフレーション(Haipāinfurēshon) MangaWritten byKyu SumiyoshiPublished byShueishaImprintJump Comics+MagazineShōnen Jump+DemographicShōnenOriginal runNovember 27, 2020 – March 17, 2023Volumes6 Hyperinflation (Japanese: ハイパーインフレーション, Hepburn: Haipāinfurēshon) is a Japanese manga series written and illustrated by Kyu Sumiyoshi. It was serialized on Shueisha's web manga p...

 

German mathematician Wolf Barth at Dortmund in 1980(photo from MFO) Wolf Paul Barth (20 October 1942, in Wernigerode – 30 December 2016, in Nuremberg) was a German mathematician who discovered Barth surfaces and whose work on vector bundles has been important for the ADHM construction.[1][2] Until 2011 Barth was working in the Department of Mathematics at the University of Erlangen-Nuremberg in Germany. Barth received a PhD degree in 1967 from the University of Göttingen. H...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!