Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process.[1] When referring specifically to probabilities, the corresponding term is probabilistic model. All statistical hypothesis tests and all statistical estimators are derived via statistical models. More generally, statistical models are part of the foundation of statistical inference. A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables. As such, a statistical model is "a formal representation of a theory" (Herman Adèr quoting Kenneth Bollen).[2]

Introduction

Informally, a statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows us to calculate the probability of any event. As an example, consider a pair of ordinary six-sided dice. We will study two different statistical assumptions about the dice.

The first statistical assumption is this: for each of the dice, the probability of each face (1, 2, 3, 4, 5, and 6) coming up is 1/6. From that assumption, we can calculate the probability of both dice coming up 5:  1/6 × 1/6 =1/36.  More generally, we can calculate the probability of any event: e.g. (1 and 2) or (3 and 3) or (5 and 6). The alternative statistical assumption is this: for each of the dice, the probability of the face 5 coming up is 1/8 (because the dice are weighted). From that assumption, we can calculate the probability of both dice coming up 5:  1/8 × 1/8 =1/64.  We cannot, however, calculate the probability of any other nontrivial event, as the probabilities of the other faces are unknown.

The first statistical assumption constitutes a statistical model: because with the assumption alone, we can calculate the probability of any event. The alternative statistical assumption does not constitute a statistical model: because with the assumption alone, we cannot calculate the probability of every event. In the example above, with the first assumption, calculating the probability of an event is easy. With some other examples, though, the calculation can be difficult, or even impractical (e.g. it might require millions of years of computation). For an assumption to constitute a statistical model, such difficulty is acceptable: doing the calculation does not need to be practicable, just theoretically possible.

Formal definition

In mathematical terms, a statistical model is a pair (), where is the set of possible observations, i.e. the sample space, and is a set of probability distributions on .[3] The set represents all of the models that are considered possible. This set is typically parameterized: . The set defines the parameters of the model. If a parameterization is such that distinct parameter values give rise to distinct distributions, i.e. (in other words, the mapping is injective), it is said to be identifiable.[3]

In some cases, the model can be more complex.

  • In Bayesian statistics, the model is extended by adding a probability distribution over the parameter space .
  • A statistical model can sometimes distinguish two sets of probability distributions. The first set is the set of models considered for inference. The second set is the set of models that could have generated the data which is much larger than . Such statistical models are key in checking that a given procedure is robust, i.e. that it does not produce catastrophic errors when its assumptions about the data are incorrect.

An example

Suppose that we have a population of children, with the ages of the children distributed uniformly, in the population. The height of a child will be stochastically related to the age: e.g. when we know that a child is of age 7, this influences the chance of the child being 1.5 meters tall. We could formalize that relationship in a linear regression model, like this: heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to obtain a prediction of height, εi is the error term, and i identifies the child. This implies that height is predicted by age, with some error.

An admissible model must be consistent with all the data points. Thus, a straight line (heighti = b0 + b1agei) cannot be admissible for a model of the data—unless it exactly fits all the data points, i.e. all the data points lie perfectly on the line. The error term, εi, must be included in the equation, so that the model is consistent with all the data points. To do statistical inference, we would first need to assume some probability distributions for the εi. For instance, we might assume that the εi distributions are i.i.d. Gaussian, with zero mean. In this instance, the model would have 3 parameters: b0, b1, and the variance of the Gaussian distribution. We can formally specify the model in the form () as follows. The sample space, , of our model comprises the set of all possible pairs (age, height). Each possible value of  = (b0, b1, σ2) determines a distribution on ; denote that distribution by . If is the set of all possible values of , then . (The parameterization is identifiable, and this is easy to check.)

In this example, the model is determined by (1) specifying and (2) making some assumptions relevant to . There are two assumptions: that height can be approximated by a linear function of age; that errors in the approximation are distributed as i.i.d. Gaussian. The assumptions are sufficient to specify —as they are required to do.

General remarks

A statistical model is a special class of mathematical model. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e. some of the variables are stochastic. In the above example with children's heights, ε is a stochastic variable; without that stochastic variable, the model would be deterministic. Statistical models are often used even when the data-generating process being modeled is deterministic. For instance, coin tossing is, in principle, a deterministic process; yet it is commonly modeled as stochastic (via a Bernoulli process). Choosing an appropriate statistical model to represent a given data-generating process is sometimes extremely difficult, and may require knowledge of both the process and relevant statistical analyses. Relatedly, the statistician Sir David Cox has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".[4]

There are three purposes for a statistical model, according to Konishi & Kitagawa:[5]

  1. Predictions
  2. Extraction of information
  3. Description of stochastic structures

Those three purposes are essentially the same as the three purposes indicated by Friendly & Meyer: prediction, estimation, description.[6]

Dimension of a model

Suppose that we have a statistical model () with . In notation, we write that where k is a positive integer ( denotes the real numbers; other sets can be used, in principle). Here, k is called the dimension of the model. The model is said to be parametric if has finite dimension.[citation needed] As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that

.

In this example, the dimension, k, equals 2. As another example, suppose that the data consists of points (x, y) that we assume are distributed according to a straight line with i.i.d. Gaussian residuals (with zero mean): this leads to the same statistical model as was used in the example with children's heights. The dimension of the statistical model is 3: the intercept of the line, the slope of the line, and the variance of the distribution of the residuals. (Note the set of all possible lines has dimension 2, even though geometrically, a line has dimension 1.)

Although formally is a single parameter that has dimension k, it is sometimes regarded as comprising k separate parameters. For example, with the univariate Gaussian distribution, is formally a single parameter with dimension 2, but it is often regarded as comprising 2 separate parameters—the mean and the standard deviation. A statistical model is nonparametric if the parameter set is infinite dimensional. A statistical model is semiparametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if k is the dimension of and n is the number of samples, both semiparametric and nonparametric models have as . If as , then the model is semiparametric; otherwise, the model is nonparametric.

Parametric models are by far the most commonly used statistical models. Regarding semiparametric and nonparametric models, Sir David Cox has said, "These typically involve fewer assumptions of structure and distributional form but usually contain strong assumptions about independencies".[7]

Nested models

Two statistical models are nested if the first model can be transformed into the second model by imposing constraints on the parameters of the first model. As an example, the set of all Gaussian distributions has, nested within it, the set of zero-mean Gaussian distributions: we constrain the mean in the set of all Gaussian distributions to get the zero-mean distributions. As a second example, the quadratic model

y = b0 + b1x + b2x2 + ε,    ε ~ 𝒩(0, σ2)

has, nested within it, the linear model

y = b0 + b1x + ε,    ε ~ 𝒩(0, σ2)

—we constrain the parameter b2 to equal 0.

In both those examples, the first model has a higher dimension than the second model (for the first example, the zero-mean model has dimension 1). Such is often, but not always, the case. As an example where they have the same dimension, the set of positive-mean Gaussian distributions is nested within the set of all Gaussian distributions; they both have dimension 2.

Comparing models

Comparing statistical models is fundamental for much of statistical inference. Konishi & Kitagawa (2008, p. 75) state: "The majority of the problems in statistical inference can be considered to be problems related to statistical modeling. They are typically formulated as comparisons of several statistical models." Common criteria for comparing models include the following: R2, Bayes factor, Akaike information criterion, and the likelihood-ratio test together with its generalization, the relative likelihood.

Another way of comparing two statistical models is through the notion of deficiency introduced by Lucien Le Cam.[8]

See also

Notes

  1. ^ Cox 2006, p. 178
  2. ^ Adèr 2008, p. 280
  3. ^ a b McCullagh 2002
  4. ^ Cox 2006, p. 197
  5. ^ Konishi & Kitagawa 2008, §1.1
  6. ^ Friendly & Meyer 2016, §11.6
  7. ^ Cox 2006, p. 2
  8. ^ Le Cam, Lucien (1964). "Sufficiency and Approximate Sufficiency". Annals of Mathematical Statistics. 35 (4). Institute of Mathematical Statistics: 1429. doi:10.1214/aoms/1177700372.

References

Further reading

Read other articles:

присілок Торосово Торосово Країна  Росія Суб'єкт Російської Федерації Ленінградська область Муніципальний район Волосовський район Поселення Клопицьке сільське поселення Код ЗКАТУ: 41206816013 Код ЗКТМО: 41606416166 Основні дані Населення ▲ 1521 Поштовий індекс 188420 Телефонний к

 

Mexican University This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Universidad Autónoma Agraria Antonio Narro – news · newspapers · books · scholar · JSTOR (September 2023) (Learn how and when to remove this template message) Antonio Narro Agrarian Autonomous UniversityMottoAlma Terra MaterMotto in EnglishEarth is The M...

 

Parque San MartínGeografiaPaís  ArgentinaProvíncia Buenos AiresPartido Merlo (partido)Altitude 16 mCoordenadas 34° 41′ 00″ S, 58° 43′ 45″ OFuncionamentoEstatuto cidade da Argentina (d)HistóriaEvento chave city status (en) (1975)IdentificadoresCódigo postal B1721Prefixo telefônico 0220editar - editar código-fonte - editar Wikidata Parque San Martín é uma cidade da Argentina, localizada no partido de Merlo na província de Buenos Aires.[1] Parque San Martín fo...

Хроніки Нарнії: Лев, чаклунка та шафаThe Chronicles of Narnia: The Lion, the Witch and the Wardrobe Жанр фентезіпригодисімейнийРежисер Ендрю АдамсонПродюсер Ендрю АдамсонДуглас ГрешамПеррі МурСценарист Енн ПікокЕндрю АдамсонКрістофер Маркус і Стівен МакфіліНа основі Хроніки Нарнії Клайва Степл

 

  هذه المقالة عن مرجان (حجر كريم). لمعانٍ أخرى، طالع مرجان (توضيح). اضغط هنا للاطلاع على كيفية قراءة التصنيف المَـرْجَان Corallium rubrum المرتبة التصنيفية نوع  التصنيف العلمي المملكة: الحيوان الشعبة: اللاسعات الطائفة: الجوفمعويات الطويئفة: الشعاعيات الرتبة: المرجان المروحي

 

Capungan api Apogon maculatus A young Apogon maculatusStatus konservasiRisiko rendahIUCN185937 TaksonomiKerajaanAnimaliaFilumChordataKelasActinopteriOrdoKurtiformesFamiliApogonidaeGenusApogonSpesiesApogon maculatus Poey, 1860 lbs Apogon maculatus, umumnya dikenal sebagai capungan api, adalah spesies capungan dari Atlantik barat. Apogon maculatus adalah ikan nokturnal, dan biasanya bersembunyi di tempat teduh. Ikan ini dapat dilihat dengan lampu merah di ruangan gelap jika diperlukan. Kadang-k...

Kei Shindachiya (信達谷 圭code: ja is deprecated , Shindachiya Kei, lahir 16 Februari 1968) adalah aktor asal Jepang. Dia dikenal dengan peran-perannya dalam serial tokusatsu dan drama: sebagai Hosokawa Tadaoki dalam serial Taiga drama Kasuga no Tsubone, dan sebagai Ken Hoshikawa / Five Blue dalam serial Super Sentai Chikyuu Sentai Fiveman. Filmografi Drama televisi Taiga drama / Kasuga no Tsubone (NHK, 1989) - Hosokawa Tadaoki Saturday Wide Gekijou / Muta Keiji-Kan Jiken File: Izu Hanto ...

 

Secretaria de Estado de Desenvolvimento Econômico, Ciência, Tecnologia e Ensino Superior Edifício Gerais (Cidade Administrativa Presidente Tancredo Neves) Rodovia Papa João Paulo II - Serra Verde Belo Horizontewww.tecnologia.mg.gov.br Atual secretário Miguel Corrêa A Secretaria de Estado de Desenvolvimento Econômico, Ciência, Tecnologia e Ensino Superior de Minas Gerais (SEDECTES) é um órgão do poder executivo do estado brasileiro de Minas Gerais. A competência desta secretaria é...

 

Norwegian actress and singer This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Lisa Stokke – news · newspapers · books · scholar · JSTOR (July 2011) (Learn how and when to remove this template mes...

American basketball player (born 1992) Victor OladipoOladipo with the Indiana Pacers in 2018No. 3 – Houston RocketsPositionShooting guard / point guardLeagueNBAPersonal informationBorn (1992-05-04) May 4, 1992 (age 31)Silver Spring, Maryland, U.S.Listed height6 ft 3 in (1.91 m)Listed weight213 lb (97 kg)Career informationHigh schoolDeMatha Catholic(Hyattsville, Maryland)CollegeIndiana (2010–2013)NBA draft2013: 1st round, 2nd overall pickSelected by th...

 

Untuk turunan aksara ini, lihat rumpun aksara Brahmi. Aksara BrāhmīAksara Brahmi pada tugu batu Asoka (k. 250 SM)Jenis aksara Abugida BahasaBahasa Tamil, Sanskerta, Prakerta, Saka, dan bahasa TokhariaPeriodeAbad ke-4 atau ke-3 SM[1][a] hingga abad ke-5 MArah penulisanKiri ke kananAksara terkaitSilsilahAbjad Proto-Sinai[b]Abjad Fenisia[b]Abjad Aram[b]Aksara BrāhmīAksara turunanAksara Gupta dan banyak lagi aksara turunanAksara kerabatAksara KharosthiISO 15924ISO 15924Brah, 300 ...

 

Le département de la Marne en rouge sur la carte La liste des sites classés de la Marne présente les sites naturels classés du département de la Marne[1]. Liste Les critères sur lesquels les sites ont été sélectionnés sont désignés par des lettres, comme suit : TC : Tout critère A : Artistique P : Pittoresque S : Scientifique H : Historique L : Légendaire Commune Site Description Date du classement Arrêté (A) / Décret (D) Critère de classem...

Bandar Udara Internasional Kempegowdaಕೆಂಪೇಗೌಡ ಅಂತರರಾಷ್ಟ್ರೀಯ ವಿಮಾನ ನಿಲ್ದಾಣIATA: BLRICAO: VOBL BLR Lokasi BLR di IndiaInformasiJenisPublikPemilik/PengelolaBangalore International Airport Limited (BIAL)MelayaniBengaluruLokasiDevanahalli, Karnataka, IndiaMaskapai penghubung Blue Dart Aviation Deccan Aviation Ketinggian dpl915 mdplKoordinat13°11′56″N 077°42′20″E / 13.19889°N 77.70556°E /...

 

Lo imperdonableGenreTelenovelaPembuatCaridad Bravo AdamsDitulis olehXimena Suárez[1] Janely Lee Alejandra DíazSutradaraJorge Robles Adrián Frutos Maza Jesús Nájera Saro Mónica Miguel Víctor Manuel FouillouxPemeran Ana Brenda Contreras Iván Sánchez Grettell Valdéz Sergio Sendel Penggubah lagu temaFernando Rossi Pablo Durano Elmer Figueroa ArceLagu pembukaTú respiración oleh ChayanneLagu penutupCómo perdonar oleh Ana Brenda ContrerasNegara asalMeksikoBahasa asliSpanyolProdu...

 

Indian bongulo bokadia serial Telugu Television series Kalyanam KamaneeyamGenreDramaBased onKumkum BhagyaWritten byDialogues: Tirumala Raju Ravi Kiran Screenplay byRavi VadlaStory byPoluri KrishnaDirected bySai VenkatStarring Meghana Lokesh Madhu Sudhan Haritha Theme music composerSunaadh GowthamCountry of originIndiaOriginal languageTeluguNo. of episodes477ProductionProducerK.V. SriramCinematographyP. Rama KrishnaProduction companySouth Indian ScreensOriginal releaseNetworkZee TeluguRelease3...

Национальный парк Скоттангл. Scott National Park Категория МСОП — II (Национальный парк)Основная информация Площадь32,73 км²  Дата основания1959 год  Управляющая организацияМинистерство парков и дикого мира Западной Австралии Расположение 34°15′29″ ю. ш. 115°14′08″ ...

 

نقش عجل بن هفعم هو أقدم نقش تظهر به اللغة العربية أقرب ما تكون إلى العربية الفصحى، عثر عليه في قرية الفاو في المملكة العربية السعودية، ويعود تاريخ كتابته للقرن الأول قبل الميلاد وهو مكتوب بالخط المسند على شاهد قبر ربيبل بن هفعم.[1][2][3][4] الاستكشاف هو أحد النق...

 

2005 studio album by Clem SnideEnd of LoveStudio album by Clem SnideReleased2005Genreindie/alt-countryLabelspinARTClem Snide chronology A Beautiful EP(2004) End of Love(2005) Suburban Field Recordings: Volume One(2005) Professional ratingsAggregate scoresSourceRatingMetacritic78/100[1]Review scoresSourceRatingAllMusic[2]Entertainment Weekly(favorable)[3]Paste(moderately favorable)[4]Pitchfork Media(6.7/10)[5]PopMatters[6]Robert Christgau...

2014 television film Lego DC Comics: Batman Be-LeagueredMovie posterScreenplay byJames KriegDirected byRick MoralesStarringTroy BakerDee Bradley BakerGrey DeLisleJohn DiMaggioTom KennyNolan NorthKhary PaytonPaul ReubensKevin Michael RichardsonJames Arnold TaylorMusic byTim KellyCountry of originUnited StatesOriginal languageEnglishProductionRunning time22 minutesProduction companiesWarner Bros. AnimationLEGODC EntertainmentOriginal releaseRelease October 27, 2014 (2014-10-27) L...

 

Untuk bangsa Madai, lihat bangsa Mede. Identifikasi geografis Keturunan Nuh oleh Flavius Yosefus, sekitar 100 M; Madai (warna merah; kanan atas). Ditulisnya: dari Madai muncul bangsa Mede, yang disebut Medes, oleh orang Yunani.[1] Madai (Ibrani: מָדַי, diucapkan [maˈda.i]; bahasa Yunani: Μηδος, [mɛːˈdos]) adalah seorang putra dari Yafet dan salah satu dari 16 cucu dari Nuh dalam Kitab Kejadian dari Alkitab Ibrani. Para sarjana Alkitab umumnya diidentifikasi Madai de...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!