Prediction interval

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

A simple example is given by a six-sided die with face values ranging from 1 to 6. The confidence interval for the estimated expected value of the face value will be around 3.5 and will become narrower with a larger sample size. However, the prediction interval for the next roll will approximately range from 1 to 6, even with any number of samples seen so far.

Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed.

Introduction

If one makes the parametric assumption that the underlying distribution is a normal distribution, and has a sample set {X1, ..., Xn}, then confidence intervals and credible intervals may be used to estimate the population mean μ and population standard deviation σ of the underlying population, while prediction intervals may be used to estimate the value of the next sample variable, Xn+1.

Alternatively, in Bayesian terms, a prediction interval can be described as a credible interval for the variable itself, rather than for a parameter of the distribution thereof.

The concept of prediction intervals need not be restricted to inference about a single future sample value but can be extended to more complicated cases. For example, in the context of river flooding where analyses are often based on annual values of the largest flow within the year, there may be interest in making inferences about the largest flood likely to be experienced within the next 50 years.

Since prediction intervals are only concerned with past and future observations, rather than unobservable population parameters, they are advocated as a better method than confidence intervals by some statisticians, such as Seymour Geisser,[citation needed] following the focus on observables by Bruno de Finetti.[citation needed]

Normal distribution

Given a sample from a normal distribution, whose parameters are unknown, it is possible to give prediction intervals in the frequentist sense, i.e., an interval [ab] based on statistics of the sample such that on repeated experiments, Xn+1 falls in the interval the desired percentage of the time; one may call these "predictive confidence intervals".[1]

A general technique of frequentist prediction intervals is to find and compute a pivotal quantity of the observables X1, ..., XnXn+1 – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation Xn+1 falling in some interval computed in terms of the observed values so far, Such a pivotal quantity, depending only on observables, is called an ancillary statistic.[2] The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out. The most familiar pivotal quantity is the Student's t-statistic, which can be derived by this method and is used in the sequel.

Known mean, known variance

A prediction interval [,u] for a future observation X in a normal distribution N(μ,σ2) with known mean and variance may be calculated from

where , the standard score of X, is distributed as standard normal.

Hence

or

with z the quantile in the standard normal distribution for which:

or equivalently;

Prediction
interval
z
75% 1.15[3]
90% 1.64[3]
95% 1.96[3]
99% 2.58[3]
Prediction interval (on the y-axis) given from z (the quantile of the standard score, on the x-axis). The y-axis is logarithmically compressed (but the values on it are not modified).

The prediction interval is conventionally written as:

For example, to calculate the 95% prediction interval for a normal distribution with a mean (μ) of 5 and a standard deviation (σ) of 1, then z is approximately 2. Therefore, the lower limit of the prediction interval is approximately 5 ‒ (2⋅1) = 3, and the upper limit is approximately 5 + (2⋅1) = 7, thus giving a prediction interval of approximately 3 to 7.

Diagram showing the cumulative distribution function for the normal distribution with mean (μ) 0 and variance (σ2) 1. In addition to the quantile function, the prediction interval for any standard score can be calculated by (1 − (1 − Φμ,σ2(standard score))⋅2). For example, a standard score of x = 1.96 gives Φμ,σ2(1.96) = 0.9750 corresponding to a prediction interval of (1 − (1 − 0.9750)⋅2) = 0.9500 = 95%.

Estimation of parameters

For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean as estimate for μ and the sample variance s2 as an estimate for σ2. There are two natural choices for s2 here – dividing by yields an unbiased estimate, while dividing by n yields the maximum likelihood estimator, and either might be used. One then uses the quantile function with these estimated parameters to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation[4] – it is not a predictive confidence interval.

For the sequel, use the sample mean:

and the (unbiased) sample variance:

Unknown mean, known variance

Given[5] a normal distribution with unknown mean μ but known variance , the sample mean of the observations has distribution while the future observation has distribution Taking the difference of these cancels the μ and yields a normal distribution of variance thus

Solving for gives the prediction distribution from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100p%, then on repeated applications of this computation, the future observation will fall in the predicted interval 100p% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean and known variance , as this uses compound variance , hence yields slightly wider intervals. This is necessary for the desired confidence interval property to hold.

Known mean, unknown variance

Conversely, given a normal distribution with known mean μ but unknown variance , the sample variance of the observations has, up to scale, a distribution; more precisely:

On the other hand, the future observation has distribution Taking the ratio of the future observation residual and the sample standard deviation s cancels the σ, yielding a Student's t-distribution with n – 1 degrees of freedom (see its derivation):

Solving for gives the prediction distribution from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation and known mean μ, as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

Unknown mean, unknown variance

Combining the above for a normal distribution with both μ and σ2 unknown yields the following ancillary statistic:[6]

This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for yields the prediction distribution

The probability of falling in a given interval is then:

where Ta is the 100((1 − p)/2)th percentile of Student's t-distribution with n − 1 degrees of freedom. Therefore, the numbers

are the endpoints of a 100(1 − p)% prediction interval for .

Non-parametric methods

One can compute prediction intervals without any assumptions on the population, i.e. in a non-parametric way.

The residual bootstrap method can be used for constructing non-parametric prediction intervals.

Conformal Prediction

In general the conformal prediction method is more general. Let us look at the special case of using the minimum and maximum as boundaries for a prediction interval: If one has a sample of identical random variables {X1, ..., Xn}, then the probability that the next observation Xn+1 will be the largest is 1/(n + 1), since all observations have equal probability of being the maximum. In the same way, the probability that Xn+1 will be the smallest is 1/(n + 1). The other (n − 1)/(n + 1) of the time, Xn+1 falls between the sample maximum and sample minimum of the sample {X1, ..., Xn}. Thus, denoting the sample maximum and minimum by M and m, this yields an (n − 1)/(n + 1) prediction interval of [mM].

Notice that while this gives the probability that a future observation will fall in a range, it does not give any estimate as to where in a segment it will fall – notably, if it falls outside the range of observed values, it may be far outside the range. See extreme value theory for further discussion. Formally, this applies not just to sampling from a population, but to any exchangeable sequence of random variables, not necessarily independent or identically distributed.

Contrast with other intervals

Contrast with confidence intervals

In the formula for the predictive confidence interval no mention is made of the unobservable parameters μ and σ of population mean and standard deviation – the observed sample statistics and of sample mean and standard deviation are used, and what is estimated is the outcome of future samples.

When considering prediction intervals, rather than using sample statistics as estimators of population parameters and applying confidence intervals to these estimates, one considers "the next sample" as itself a statistic, and computes its sampling distribution.

In parameter confidence intervals, one estimates population parameters; if one wishes to interpret this as prediction of the next sample, one models "the next sample" as a draw from this estimated population, using the (estimated) population distribution. By contrast, in predictive confidence intervals, one uses the sampling distribution of (a statistic of) a sample of n or n + 1 observations from such a population, and the population distribution is not directly used, though the assumption about its form (though not the values of its parameters) is used in computing the sampling distribution.

In regression analysis

A common application of prediction intervals is to regression analysis. Suppose the data is being modeled by a straight line (simple linear regression):

where is the response variable, is the explanatory variable, εi is a random error term, and and are parameters.

Given estimates and for the parameters, such as from a ordinary least squares, the predicted response value yd for a given explanatory value xd is

(the point on the regression line), while the actual response would be

The point estimate is called the mean response, and is an estimate of the expected value of yd,

A prediction interval instead gives an interval in which one expects yd to fall; this is not necessary if the actual parameters α and β are known (together with the error term εi), but if one is estimating from a sample, then one may use the standard error of the estimates for the intercept and slope ( and ), as well as their correlation, to compute a prediction interval.

In regression, Faraway (2002, p. 39) makes a distinction between intervals for predictions of the mean response vs. for predictions of observed response—affecting essentially the inclusion or not of the unity term within the square root in the expansion factors above; for details, see Faraway (2002).

Bayesian statistics

Seymour Geisser, a proponent of predictive inference, gives predictive applications of Bayesian statistics.[7]

In Bayesian statistics, one can compute (Bayesian) prediction intervals from the posterior probability of the random variable, as a credible interval. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself. However, particularly where applications are concerned with possible extreme values of yet to be observed cases, credible intervals for such values can be of practical importance.

Applications

Prediction intervals are commonly used as definitions of reference ranges, such as reference ranges for blood tests to give an idea of whether a blood test is normal or not. For this purpose, the most commonly used prediction interval is the 95% prediction interval, and a reference range based on it can be called a standard reference range.

See also

Notes

  1. ^ Geisser (1993, p. 6): Chapter 2: Non-Bayesian predictive approaches
  2. ^ Geisser (1993, p. 7)
  3. ^ a b c d Table A2 in Sterne & Kirkwood (2003, p. 472)
  4. ^ Geisser (1993, pp. 8–9)
  5. ^ Geisser (1993, p. 7–)
  6. ^ Geisser (1993, Example 2.2, p. 9–10)
  7. ^ Geisser (1993)

References

  • Faraway, Julian J. (2002), Practical Regression and Anova using R (PDF)
  • Geisser, Seymour (1993), Predictive Inference, CRC Press
  • Sterne, Jonathan; Kirkwood, Betty R. (2003), Essential Medical Statistics, Blackwell Science, ISBN 0-86542-871-9

Further reading

Read other articles:

Audi ForumUbicaciónPaís  AlemaniaCoordenadas 49°11′39″N 9°13′20″E / 49.1941, 9.22233[editar datos en Wikidata] Audi Forum Neckarsulm (en noviembre de 2006). El Foro de Audi en Neckarsulm es una edificio representativo de Audi AG, cuenta con un centro de servicios, una exposición histórica de Audi, Horch y NSU, un Audi Shop, un restaurante, un centro de conferencias y es una opción para que los compradores recojan los modelos A6 y A8. Audi AG tambi...

 

Mohamad Wahid SupriyadiDuta Besar Indonesia Untuk RusiaPetahanaMulai menjabat 2016PresidenJoko WidodoPendahuluDjauhari Oratmangun Informasi pribadiLahir18 Agustus 1959 (umur 64) Kebumen, Jawa Tengah, IndonesiaKebangsaan IndonesiaSuami/istriMurgiyati SupriyadiAlma materUniversitas Gajah MadaPekerjaanDiplomatSunting kotak info • L • B Mohamad Wahid Supriyadi (lahir 18 Agustus 1959) adalah seorang diplomat Indonesia, Duta Besar luar biasa dan berkuasa penuh Republik I...

 

Department in Southwest Region, CameroonNdianDepartmentDepartment location in CameroonCountry CameroonRegionSouthwest RegionCapitalMundembaArea • Total6,626 km2 (2,558 sq mi)Population (2005) • Total362,201 • Density55/km2 (140/sq mi)Time zoneUTC+1 (WAT) Ndian is a department of Southwest Region in Cameroon. It is located in the humid tropical rainforest zone about 650 km (400 mi) southeast of Yaoundé, the capital. H...

Sint-Vincentius a Paulokerk De Sint-Vincentius a Paulokerk is de parochiekerk van de tot de Antwerpse gemeente Bornem behorende plaats Branst, gelegen aan Luipegem 189. Geschiedenis In 1880 werd in Branst een kapel gebouwd die als voorlopige kerk gebruikt werd en tevens als schoollokaal dienst deed. In 1894-1895 werd Branst erkend als hulpparochie van Bornem. Van 1898-1900 bouwde men een definitieve kerk, naar ontwerp van Léonard Blomme. Gebouw Het betreft een georiënteerde bakstenen basili...

 

Radio station in Park Falls, Wisconsin WPFPPark Falls, WisconsinFrequency980 kHzBranding103.1 Jack FMProgrammingFormatVariety hitsAffiliationsJack FM networkOwnershipOwnerThe Marks Group(Park Falls Community Broadcasting Corporation)Sister stationsWCQMHistoryFirst air date1953 (1953)Former call signsWPFP (1953-1968)WNBI (1968–2010)Technical informationFacility ID48847ClassDPower1,000 watts day105 watts nightTransmitter coordinates45°55′4.00″N 90°26′58.00″W / 4...

 

Montañas Dinghu Ubicación geográficaCoordenadas 23°09′34″N 112°33′24″E / 23.159497222222, 112.55656111111Ubicación administrativaPaís  ChinaDivisión provincia de CantónMapa de localización Montañas Dinghu Ubicación (Guangdong).[editar datos en Wikidata] Las montañas Dinghu (en chino, 鼎湖山; pinyin, Dǐnghú shān) son una de las cuatro montañas sagradas de la China, localizadas en la cordillera Dayunwu. Las otras son las montañas D...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada September 2020. Anri SakaguchiNama asal坂口 杏里Lahir3 Maret 1991 (umur 32)Tokyo, JepangKebangsaanJepangPendidikanShoto KindergartenSeijo Gakuen Primary SchoolSeijo Gakuen Junior High School and High SchoolHorikoshi High SchoolPekerjaanEntertainerTahun...

 

Malik of Bamiyan Baha al-Din Sam IIMalik of BamiyanReign1192–1206PredecessorAbbas ibn MuhammadSuccessorJalal al-Din AliBorn12th-centuryBamiyanDied24 February 1206Near GhazniSpouseDaughter of Ala al-Din AtsizIssueJalal al-Din AliAla al-Din MuhammadHouseGhuridFatherShams al-Din Muhammad ibn MasudMotherSister of Ghiyath al-Din MuhammadReligionSunni Islam Baha al-Din Sam II (Persian: بهاء الدین سام) was the fourth ruler of the Ghurid branch of Bamiyan, ruling from 1192 to 1206. Orig...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (أكتوبر 2022) سئمت 2023   البلد بولندا  تاريخ التأسيس 2023  المقر الرئيسي وارسو  الأيديولوجيا ليبرالية كلاسيكية[1]،  والموالية الأوروبية[1][2]،  ونس...

Jackson at the White House in 1990 American singer Michael Jackson (1958–2009) debuted on the professional music scene at age five as a member of the American family music group The Jackson 5 and began a solo career in 1971 while still part of the group.[1] Jackson promoted seven of his solo albums with music videos or, as he would refer to them, short films. Some of them drew criticism for their violent and sexual elements while others were lauded by critics and awarded Guinne...

 

Policy on permits required to enter Cameroon Politics of Cameroon Constitution Human rights Government President (list) Paul Biya Prime Minister (list) Joseph Ngute Government Parliament Senate President: Marcel Niat Njifenji National Assembly President: Cavayé Yéguié Djibril Administrative divisions Regions Departments Communes Villages Elections Recent elections Presidential: 20182025 Parliamentary: 20202025 Political parties Foreign relations Ministry of Foreign Affairs Minister: Lejeun...

 

Кандинський Василь Васильовичрос. Василий Васильевич Кандинский Народився 4 (16) грудня 1866[1][2][…]Москва, Російська імперія[4][5][…]Помер 13 грудня 1944(1944-12-13)[4][7][…] (77 років)Неї-сюр-Сен, Франція[4][5][6]·інсультПоховання Neuilly-sur-Seine New Communal Cemeteryd&#...

For government registration of sex offenders, see Sex offender registry and Sex offender registries in the United States. 2010 studio album by Rob SchneiderRegistered OffenderStudio album by Rob SchneiderReleasedJuly 2010Recorded2007-2009GenreSketch comedyLabelOglio Records Registered Offender is the debut comedy album by actor and stand-up comedian Rob Schneider. Released in July 2010 by Oglio Records,[1] it contains a mixture of sketches and songs, with all voices performed ...

 

1944 film by Roy Del Ruth Barbary Coast GentTheatrical release posterDirected byRoy Del RuthWritten byWilliam R. LipmanProduced byOrville O. Dull Harry RuskinStarringWallace BeeryCinematographyCharles Salerno Jr.Edited byAdrienne FazanMusic byDavid SnellProductioncompanyMetro-Goldwyn-MayerDistributed byMetro-Goldwyn-MayerRelease date September 1944 (1944-09) Running time87 minutesCountryUnited StatesLanguageEnglish Barbary Coast Gent is a 1944 American Western comedy film set in 188...

 

1995 film by Mathieu Kassovitz Not to be confused with Haine (film). For the unrelated play, see La Haine (drama). For other uses, see Haine (disambiguation).You can help expand this article with text translated from the corresponding article in French. (August 2023) Click [show] for important translation instructions. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the tran...

Scottish-American television host, comedian, author, and actor For the hockey player, see Craig Ferguson (ice hockey). Craig FergusonFerguson at the 2012 premiere of Brave at the Dolby Theatre, Los Angeles, CaliforniaPseudonymBing Hitler[1]Born (1962-05-17) 17 May 1962 (age 61)Glasgow, ScotlandMediumStand-up, television, film, musicNationality British American (naturalised 2008) Years active1980–presentGenresObservational comedy, improvisational comedy, surreal comedy, blue com...

 

Roberto BurioniLahir10 Desember 1962 (umur 60)Pesaro, ItaliaAlmamaterUniversità Cattolica del Sacro Cuore (Wisuda)University of Genoa (PhD)PekerjaanDokter medis, Profesor universitasSuami/istrimenikahAnak1 puteri Roberto Burioni (lahir 10 Desember 1962 (umur 60)) adalah seorang ahli virus dan juga akademisi berkewarganegaraan Italia. Dia juga seorang profesor bidang mikrobiologi dan virologi di Vita-Salute San Raffaele University, kota Milan Italia, di mana ia menjalankan penelitia...

 

Public university in Dire Dawa, Ethiopia Dire Dawa UniversityMottoOasis of KnowledgeTypePublic universityEstablished2006AccreditationMinistry of EducationPresidentUbah AdemVice-presidentMegersa Kasim HussenAcademic staff1,177Students12,500 (2018)LocationDire Dawa, Ethiopia9°37′12″N 41°50′27″E / 9.619900°N 41.840745°E / 9.619900; 41.840745Campus1LanguageEnglishWebsitewww.ddu.edu.etLocation in Ethiopia Dire Dawa University is a public university located in Di...

Theatre company in London, England For other uses, see Coronet Theatre (disambiguation). The Coronet Theatre Notting HillThe Coronet Theatre in 2021Full nameThe Coronet TheatreFormer namesCoronet Theatre (1898–1950)Gaumont Theatre (1950–1977)Coronet Cinema (1977–2014)The Print Room at the Coronet (2014–May 2019) The Coronet Theatre (May 2019 - present)Address103–111 Notting Hill Gate, London W11 3LBLocationLondonCoordinates51°30′31″N 00°11′53″W / 51.50861°...

 

Basketball competition EuroLeague WomenSeason2019–20Dates25 September – 2 October 2019(qualifying)16 October 2019 – 11 March 2020(competition proper)Number of teams16 (competition proper)19 (total)Regular seasonSeason MVP Alina Iagupova(Fenerbahçe Öznur Kablo)FinalsChampionsNull and void← 2018–19 2020–21 → The 2019–20 EuroLeague Women was the 62nd edition of the European women's club basketball championship organized by FIBA, and the 24th edition since being rebranded as the...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!