Empirical distribution function

The green curve, which asymptotically approaches heights of 0 and 1 without reaching them, is the true cumulative distribution function of the standard normal distribution. The grey hash marks represent the observations in a particular sample drawn from that distribution, and the horizontal steps of the blue step function (including the leftmost point in each step but not including the rightmost point) form the empirical distribution function of that sample. (Click here to load a new graph.)
The green curve, which asymptotically approaches heights of 0 and 1 without reaching them, is the true cumulative distribution function of the standard normal distribution. The grey hash marks represent the observations in a particular sample drawn from that distribution, and the horizontal steps of the blue step function (including the leftmost point in each step but not including the rightmost point) form the empirical distribution function of that sample. (Click here to load a new graph.)

In statistics, an empirical distribution function (commonly also called an empirical cumulative distribution function, eCDF) is the distribution function associated with the empirical measure of a sample.[1] This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.

Definition

Let (X1, …, Xn) be independent, identically distributed real random variables with the common cumulative distribution function F(t). Then the empirical distribution function is defined as[2]

where is the indicator of event A. For a fixed t, the indicator is a Bernoulli random variable with parameter p = F(t); hence is a binomial random variable with mean nF(t) and variance nF(t)(1 − F(t)). This implies that is an unbiased estimator for F(t).

However, in some textbooks, the definition is given as

[3][4]

Asymptotic properties

Since the ratio (n + 1)/n approaches 1 as n goes to infinity, the asymptotic properties of the two definitions that are given above are the same.

By the strong law of large numbers, the estimator converges to F(t) as n → ∞ almost surely, for every value of t:[2]

thus the estimator is consistent. This expression asserts the pointwise convergence of the empirical distribution function to the true cumulative distribution function. There is a stronger result, called the Glivenko–Cantelli theorem, which states that the convergence in fact happens uniformly over t:[5]

The sup-norm in this expression is called the Kolmogorov–Smirnov statistic for testing the goodness-of-fit between the empirical distribution and the assumed true cumulative distribution function F. Other norm functions may be reasonably used here instead of the sup-norm. For example, the L2-norm gives rise to the Cramér–von Mises statistic.

The asymptotic distribution can be further characterized in several different ways. First, the central limit theorem states that pointwise, has asymptotically normal distribution with the standard rate of convergence:[2]

This result is extended by the Donsker’s theorem, which asserts that the empirical process , viewed as a function indexed by , converges in distribution in the Skorokhod space to the mean-zero Gaussian process , where B is the standard Brownian bridge.[5] The covariance structure of this Gaussian process is

The uniform rate of convergence in Donsker’s theorem can be quantified by the result known as the Hungarian embedding:[6]

Alternatively, the rate of convergence of can also be quantified in terms of the asymptotic behavior of the sup-norm of this expression. Number of results exist in this venue, for example the Dvoretzky–Kiefer–Wolfowitz inequality provides bound on the tail probabilities of :[6]

In fact, Kolmogorov has shown that if the cumulative distribution function F is continuous, then the expression converges in distribution to , which has the Kolmogorov distribution that does not depend on the form of F.

Another result, which follows from the law of the iterated logarithm, is that [6]

and

Confidence intervals

Empirical CDF, CDF and confidence interval plots for various sample sizes of normal distribution
Empirical CDF, CDF and confidence interval plots for various sample sizes of Cauchy distribution
Empirical CDF, CDF and confidence interval plots for various sample sizes of triangle distribution

As per Dvoretzky–Kiefer–Wolfowitz inequality the interval that contains the true CDF, , with probability is specified as

As per the above bounds, we can plot the Empirical CDF, CDF and confidence intervals for different distributions by using any one of the statistical implementations.

Statistical implementation

A non-exhaustive list of software implementations of Empirical Distribution function includes:

  • In R software, we compute an empirical cumulative distribution function, with several methods for plotting, printing and computing with such an “ecdf” object.
  • In MATLAB we can use Empirical cumulative distribution function (cdf) plot
  • jmp from SAS, the CDF plot creates a plot of the empirical cumulative distribution function.
  • Minitab, create an Empirical CDF
  • Mathwave, we can fit probability distribution to our data
  • Dataplot, we can plot Empirical CDF plot
  • Scipy, we can use scipy.stats.ecdf
  • Statsmodels, we can use statsmodels.distributions.empirical_distribution.ECDF
  • Matplotlib, using the matplotlib.pyplot.ecdf function (new in version 3.8.0)[7]
  • Seaborn, using the seaborn.ecdfplot function
  • Plotly, using the plotly.express.ecdf function
  • Excel, we can plot Empirical CDF plot
  • ArviZ, using the az.plot_ecdf function

See also

References

  1. ^ A modern introduction to probability and statistics: Understanding why and how. Michel Dekking. London: Springer. 2005. p. 219. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
  2. ^ a b c van der Vaart, A.W. (1998). Asymptotic statistics. Cambridge University Press. p. 265. ISBN 0-521-78450-6.
  3. ^ Coles, S. (2001) An Introduction to Statistical Modeling of Extreme Values. Springer, p. 36, Definition 2.4. ISBN 978-1-4471-3675-0.
  4. ^ Madsen, H.O., Krenk, S., Lind, S.C. (2006) Methods of Structural Safety. Dover Publications. p. 148-149. ISBN 0486445976
  5. ^ a b van der Vaart, A.W. (1998). Asymptotic statistics. Cambridge University Press. p. 266. ISBN 0-521-78450-6.
  6. ^ a b c van der Vaart, A.W. (1998). Asymptotic statistics. Cambridge University Press. p. 268. ISBN 0-521-78450-6.
  7. ^ "What's new in Matplotlib 3.8.0 (Sept 13, 2023) — Matplotlib 3.8.3 documentation".

Further reading

Read other articles:

Radio station in Newton, Massachusetts WXKSNewton, MassachusettsBroadcast areaGreater BostonFrequency1200 kHzBrandingTalk 1200ProgrammingFormatConservative talkAffiliationsFox News RadioCompass Media NetworksPremiere NetworksWestwood OneOwnershipOwneriHeartMedia(iHM Licenses, LLC)Sister stationsWBWL, WBZ, WJMN, WRKO, WXKS-FM, WZLX, WZRMHistoryFirst air dateApril 21, 1947; 76 years ago (1947-04-21)Former call signsWKOX (1947–2010)Former frequencies1190 kHz (1947–8...

 

Mountain range in New South Wales, Australia This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Illawarra escarpment – news · newspapers · books · scholar · JSTOR (March 2007) (Learn how and when to remove this template message) IllawarraLookout from the Illawarra Escarpment above Wombarra over the northern Ill...

 

Map of the Silverstone Grand Prix Circuit The 2012 6 Hours of Silverstone was held at Silverstone on 26 August 2012, and was the fourth round in the 2012 FIA World Endurance Championship season. Audi claimed the LMP1 Manufacturers' World Championship with an overall victory at the event.[1] Qualifying Qualifying result Pole position winners in each class are marked in bold.[2] Pos Class Team Driver Lap Time Grid 1 LMP1 #1 Audi Sport Team Joest Benoît Tréluyer 1:43.663 1 2 LM...

River in California and Nevada, United States West Walker RiverWest Walker River viewed near U.S. Highway 395Walker River, showing the West Walker and East Walker riversLocationCountryUnited StatesStateCalifornia, NevadaPhysical characteristicsSourceSierra Nevada • locationCalifornia • coordinates38°08′24″N 119°30′28″W / 38.14000°N 119.50778°W / 38.14000; -119.50778[1] • elevation9,640 ft (2,...

 

British TV series or programme AmnesiaDVD coverGenre Crime drama Thriller Created byChris LangStarring John Hannah Jemma Redgrave Anthony Calf Patrick Malahide Brendan Coyle Country of originUnited KingdomOriginal languageEnglishNo. of series1No. of episodes2 (list of episodes)ProductionExecutive producerRobert BernsteinProducerJeremy GwiltRunning time90 mins (w/out advertisements)Original releaseNetworkITVRelease29 March (2004-03-29) –30 March 2004 (2004-03-30) Amnesia ...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (نوفمبر 2018) سيسيليا يي   معلومات شخصية الميلاد 8 مارس 1963 (60 سنة)  هونغ كونغ  مواطنة الصين  الحياة العملية المهنة ممثلة  اللغات الصينية  المواقع IMDB صفحتها ع�...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يوليو 2019) مايكل تسيمرمان معلومات شخصية الميلاد 17 نوفمبر 1951[1]  مولهايم أن در رور  الوفاة 20 يناير 2007 (55 سنة) [1]  إسن  مواطنة ألمانيا  الحياة العملية الم

 

Teaching and research institute Not to be confused with Europa-Institut, European University (disambiguation), or College of Europe. 43°48′10″N 11°16′58″E / 43.80278°N 11.28278°E / 43.80278; 11.28278 European University Institute (EUI)TypeIntergovernmental organisationEstablished1972Budget€61,645,000[1]PresidentRenaud DehousseSecretary GeneralMarco Del PantaDoctoral studentsAnnually 140 postgraduate researchers for four yearsAddressVia dei Roccett...

 

Este artículo o sección necesita referencias que aparezcan en una publicación acreditada.Este aviso fue puesto el 22 de octubre de 2012. Haemorrhage Datos generalesOrigen Madrid, España Información artísticaGénero(s) GoregrindPeríodo de actividad 1990 – presenteDiscográfica(s) Relapse RecordsArtistas relacionados Carcass Last Days of Humanity RegurgitateWebSitio web http://www.haemorrhage.grindgore.net/Miembros Luisma Ramon Lugubrious Ana Osckar BravoExmiembros Rojas 1996...

Piper KermanKerman di University of Missouri pada tahun 2014LahirPiper Eressea Kerman28 September 1969 (umur 54)Boston, Massachusetts, U.S.[1]Tempat tinggalColumbus, Ohio, U.S.AlmamaterSmith CollegePekerjaanWriter, author, memoiristKarya terkenalOrange Is the New Black: My Year in a Women's PrisonSuami/istriLarry Smith ​(m. 2006)​Situs webpiperkerman.com www.thepipebomb.com Piper Eressea Kerman[2] (lahir 28 September 1969) adalah seorang penul...

 

Модель англійського бомбардирського корабля Гранадо (1742), видна тільки одна з двох мортир Обстріл Бастії бомбардирським кораблем, 1745 рік. Бомбардирський корабель — вітрильне 2-3-щоглове судно кінця XVII — початку XIX століття з підвищеною міцністю корпусу, призначене �...

 

يوسف فخر الدين معلومات شخصية اسم الولادة يوسف محمد فخر الدين الميلاد 15 يناير 1935(1935-01-15)مصر الجديدة، المملكة المصرية الوفاة 27 ديسمبر 2002 (67 سنة)أثينا، اليونان الجنسية  مصر أسماء أخرى فتى الشاشة الوسيم الزوجة نادية سيف النصرسيدة يونانية والدان محمد فخر الدين الحياة العملية ...

  Calatayudقلعة أيوب (بالإسبانية: Calatayud)‏[1]  قلعة أيوب قلعة أيوب  خريطة الموقع سميت باسم قلعة  [لغات أخرى]‏  تاريخ التأسيس 1834  تقسيم إداري البلد  إسبانيا[2][3] المنطقة أراغون المسؤولون المقاطعة سرقسطة خصائص جغرافية إحداثيات 41°21′00″N 1°38′00″W ...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (نوفمبر 2018) تونى مارتين معلومات شخصية الميلاد سنة 1953 (العمر 69–70 سنة)  تامورث  مواطنة أستراليا  الزوجة راشيل بليك (2003–)  الحياة العملية المهنة ممثل،  وممث�...

 

Bus rapid transit system of the city of Guangzhou, China GBRT redirects here. For gradient-boosted regression trees, see Gradient boosting. Guangzhou Bus Rapid Transit (GBRT)CRRC TEG6180BEV02 articulated bus on route B1OverviewLocaleGuangzhouTransit typeBus rapid transitNumber of lines31Number of stations26Daily ridership1,000,000[1]Websitehttp://www.gz-brt.cn/OperationBegan operation10 February 2010TechnicalSystem length22.5 km (14.0 mi)Top speed21 km/h (13 mph) G...

Black feminist identity practices Afro-feminism redirects here. For feminism in Africa, see African feminism. This article's lead section may be too long. Please read the length guidelines and help move details into the article's body. (September 2022) Part of a series onFeminism History Feminist history History of feminism Women's history American British Canadian German Waves First Second Third Fourth Timelines Women's suffrage Muslim countries US Other women's rights Women's suffrage by co...

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Barfly – news · newspapers · books · scholar · JSTOR (November 2015) (Learn how and when to remove this template message) The Barfly, CamdenLocationCamden Town, London, EnglandOwnerMAMA & CompanyTypeLive MusicGenre(s)IndieRockHip hopPopCapacity200 [...

 

1801–1917 viceroyalty of the Russian Empire Viceroyalty in Russian EmpireCaucasus Viceroyalty Кавказское наместничествоViceroyaltyAdministrative map of the Caucasus ViceroyaltyCountryRussian EmpireEstablished1801Abolished1917CapitalTiflis(present-day Tbilisi)Area • Viceroyalty410,423.66 km2 (158,465.46 sq mi)Highest elevation (Mount Elbrus)5,642 m (18,510 ft)Population (1916) • Viceroyalty12,266,282 ...

Local pressure deviation caused by a sound wave Not to be confused with Sound energy density. Sound measurementsCharacteristicSymbols Sound pressure p, SPL, LPA Particle velocity v, SVL Particle displacement δ Sound intensity I, SIL Sound power P, SWL, LWA Sound energy W Sound energy density w Sound exposure E, SEL Acoustic impedance Z Audio frequency AF Transmission loss TLvte S...

 

Commercial building in Manhattan, New York Not to be confused with Manufacturers Hanover Trust Company Building (600 Fifth Avenue) in Rockefeller Center or with New York County National Bank Building (77–79 Eighth Avenue), originally designated as a New York City landmark under the name Manufacturers Hanover Trust Company Building. Manufacturers Trust Company BuildingFront of the building (2013)Former namesManufacturers Hanover Trust Company BuildingGeneral informationArchitectural styleInt...