In a regression model setting, the goal is to establish whether or not a relationship exists between a response variable and a set of predictor variables. Further, if a relationship does exist, the goal is then to be able to describe this relationship as best as possible. A main assumption in linear regression is constant variance or (homoscedasticity), meaning that different response variables have the same variance in their errors, at every predictor level. This assumption works well when the response variable and the predictor variable are jointly normal. As we will see later, the variance function in the Normal setting is constant; however, we must find a way to quantify heteroscedasticity (non-constant variance) in the absence of joint Normality.
When it is likely that the response follows a distribution that is a member of the exponential family, a generalized linear model may be more appropriate to use, and moreover, when we wish not to force a parametric model onto our data, a non-parametric regression approach can be useful. The importance of being able to model the variance as a function of the mean lies in improved inference (in a parametric setting), and estimation of the regression function in general, for any setting.
Variance functions play a very important role in parameter estimation and inference. In general, maximum likelihood estimation requires that a likelihood function be defined. This requirement then implies that one must first specify the distribution of the response variables observed. However, to define a quasi-likelihood, one need only specify a relationship between the mean and the variance of the observations to then be able to use the quasi-likelihood function for estimation.[3]Quasi-likelihood estimation is particularly useful when there is overdispersion. Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data.
In summary, to ensure efficient inference of the regression parameters and the regression function, the heteroscedasticity must be accounted for. Variance functions quantify the relationship between the variance and the mean of the observed data and hence play a significant role in regression estimation and inference.
Types
The variance function and its applications come up in many areas of statistical analysis. A very important use of this function is in the framework of generalized linear models and non-parametric regression.
Generalized linear model
When a member of the exponential family has been specified, the variance function can easily be derived.[4]: 29 The general form of the variance function is presented under the exponential family context, as well as specific forms for Normal, Bernoulli, Poisson, and Gamma. In addition, we describe the applications and use of variance functions in maximum likelihood estimation and quasi-likelihood estimation.
Derivation
The generalized linear model (GLM), is a generalization of ordinary regression analysis that extends to any member of the exponential family. It is particularly useful when the response variable is categorical, binary or subject to a constraint (e.g. only positive responses make sense). A quick summary of the components of a GLM are summarized on this page, but for more details and information see the page on generalized linear models.
A GLM consists of three main ingredients:
1. Random Component: a distribution of y from the exponential family,
2. Linear predictor:
3. Link function:
First it is important to derive a couple key properties of the exponential family.
Any random variable in the exponential family has a probability density function of the form,
with loglikelihood,
Here, is the canonical parameter and the parameter of interest, and is a nuisance parameter which plays a role in the variance.
We use the Bartlett's Identities to derive a general expression for the variance function.
The first and second Bartlett results ensures that under suitable conditions (see Leibniz integral rule), for a density function dependent on ,
These identities lead to simple calculations of the expected value and variance of any random variable in the exponential family .
Expected value of Y:
Taking the first derivative with respect to of the log of the density in the exponential family form described above, we have
Then taking the expected value and setting it equal to zero leads to,
Variance of Y:
To compute the variance we use the second Bartlett identity,
We have now a relationship between and , namely
and , which allows for a relationship between and the variance,
Note that because , then is invertible.
We derive the variance function for a few common distributions.
Example – normal
The normal distribution is a special case where the variance function is a constant. Let then we put the density function of y in the form of the exponential family described above:
where
To calculate the variance function , we first express as a function of . Then we transform into a function of
Therefore, the variance function is constant.
Example – Bernoulli
Let , then we express the density of the Bernoulli distribution in exponential family form,
Let , then we express the density of the Poisson distribution in exponential family form,
which gives us
and
This give us
Here we see the central property of Poisson data, that the variance is equal to the mean.
Example – Gamma
The Gamma distribution and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters
Then in exponential family form we have
And we have
Application – weighted least squares
A very important application of the variance function is its use in parameter estimation and inference when the response variable is of the required exponential family form as well as in some cases when it is not (which we will discuss in quasi-likelihood). Weighted least squares (WLS) is a special case of generalized least squares. Each term in the WLS criterion includes a weight that determines that the influence each observation has on the final parameter estimates. As in regular least squares, the goal is to estimate the unknown parameters in the regression function by finding values for parameter estimates that minimize the sum of the squared deviations between the observed responses and the functional portion of the model.
While WLS assumes independence of observations it does not assume equal variance and is therefore a solution for parameter estimation in the presence of heteroscedasticity. The Gauss–Markov theorem and Aitken demonstrate that the best linear unbiased estimator (BLUE), the unbiased estimator with minimum variance, has each weight equal to the reciprocal of the variance of the measurement.
In the GLM framework, our goal is to estimate parameters , where . Therefore, we would like to minimize and if we define the weight matrix W as
Also, important to note is that when the weight matrix is of the form described here, minimizing the expression also minimizes the Pearson distance. See Distance correlation for more.
The matrix W falls right out of the estimating equations for estimation of . Maximum likelihood estimation for each parameter , requires
, where is the log-likelihood.
Looking at a single observation we have,
This gives us
, and noting that
we have that
The Hessian matrix is determined in a similar manner and can be shown to be,
Noticing that the Fisher Information (FI),
, allows for asymptotic approximation of
, and hence inference can be performed.
Application – quasi-likelihood
Because most features of GLMs only depend on the first two moments of the distribution, rather than the entire distribution, the quasi-likelihood can be developed by just specifying a link function and a variance function. That is, we need to specify
Though called a quasi-likelihood, this is in fact a quasi-log-likelihood. The QL for one observation is
And therefore the QL for all n observations is
From the QL we have the quasi-score
Quasi-score (QS)
Recall the score function, U, for data with log-likelihood is
We obtain the quasi-score in an identical manner,
Noting that, for one observation the score is
The first two Bartlett equations are satisfied for the quasi-score, namely
and
In addition, the quasi-score is linear in y.
Ultimately the goal is to find information about the parameters of interest . Both the QS and the QL are actually functions of . Recall, , and , therefore,
The QL, QS and QI all provide the building blocks for inference about the parameters of interest and therefore it is important to express the QL, QS and QI all as functions of .
Recalling again that , we derive the expressions for QL, QS and QI parametrized under .
Quasi-likelihood in ,
The QS as a function of is therefore
Where,
The quasi-information matrix in is,
Obtaining the score function and the information of allows for parameter estimation and inference in a similar manner as described in Application – weighted least squares.
Non-parametric regression analysis
Non-parametric estimation of the variance function and its importance, has been discussed widely in the literature[5][6][7]
In non-parametric regression analysis, the goal is to express the expected value of your response variable(y) as a function of your predictors (X). That is we are looking to estimate a mean function, without assuming a parametric form. There are many forms of non-parametric smoothing methods to help estimate the function . An interesting approach is to also look at a non-parametric variance function, . A non-parametric variance function allows one to look at the mean function as it relates to the variance function and notice patterns in the data.
An example is detailed in the pictures to the right. The goal of the project was to determine (among other things) whether or not the predictor, number of years in the major leagues (baseball), had an effect on the response, salary, a player made. An initial scatter plot of the data indicates that there is heteroscedasticity in the data as the variance is not constant at each level of the predictor. Because we can visually detect the non-constant variance, it useful now to plot , and look to see if the shape is indicative of any known distribution. One can estimate and using a general smoothing method. The plot of the non-parametric smoothed variance function can give the researcher an idea of the relationship between the variance and the mean. The picture to the right indicates a quadratic relationship between the mean and the variance. As we saw above, the Gamma variance function is quadratic in the mean.
^Rice and Silverman (1991). "Estimating the Mean and Covariance structure nonparametrically when the data are curves". Journal of the Royal Statistical Society. 53 (1): 233–243. JSTOR2345738.
Codes postaux canadiens NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E G H J K L M N P R S T V X Y Toronto métropolitain - 102 RTA Note: Il n'y a pas de RTA rurale à Toronto, ni de code postal débutant par M0. M1ANon assigné M2ANon assigné M3ANorth York(York Heights / Victoria Village / Parkway East) M4ANorth York(Sweeney Park / Wigmore Park) M5ACentre-ville(Regent Park / Port de Toronto) M6ANorth York(Lawrence Manor / Lawrence Heights) M7AQueen's ParkGouvernement provincial de l'Ontario...
ЛьйокурLiocourt Країна Франція Регіон Гранд-Ест Департамент Мозель Округ Саррбур-Шато-Сален Кантон Дельм Код INSEE 57406 Поштові індекси 57590 Координати 48°54′39″ пн. ш. 6°20′40″ сх. д.H G O Висота 253 - 398 м.н.р.м. Площа 3,17 км² Населення 141 (01-2020[1]) Густота 46,37 ос./км...
American college basketball season 2013–14 Temple Owls men's basketballConferenceAmerican Athletic ConferenceRecord9–22 (4–14 The American)Head coachFran Dunphy (8th season)Assistant coaches Dave Duke Dwayne Killings Shawn Trice Home arenaLiacouras CenterSeasons← 2012–132014–15 → 2013–14 American Athletic Conference men's basketball standings vte Conf Overall Team W L PCT W L PCT No. 5 Louisville* † 15 – 3 ...
هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يوليو 2019) هان هاي-جين معلومات شخصية الميلاد 23 مارس 1983 (40 سنة) كوريا الجنوبية مواطنة كوريا الجنوبية الحياة العملية المدرسة الأم جامعة كيونجي المهنة عارض
يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (ديسمبر 2018) هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها...
إغناسيو كارلوس غونزاليس معلومات شخصية الميلاد 17 ديسمبر 1971 (العمر 51 سنة)ساراندي الطول 1.89 م (6 قدم 2 1⁄2 بوصة) مركز اللعب حارس مرمى الجنسية الأرجنتين المسيرة الاحترافية1 سنوات فريق م. (هـ.) 1991–1997 راسينغ 133 (8) 1997–1998 نيولز أولد بويز 16 (0) 1998–2002 لاس بالماس 56 (6) 1999–20...
American sprinter Kathy HammondPersonal informationFull nameKathryn HammondBornNovember 2, 1951 (1951-11-02) (age 72)Sacramento, California, U.S. Medal record Women's athletics Representing United States Olympic Games 1972 Munich 4 x 400 meters 1972 Munich 400 meters Kathryn Kathy Hammond (born November 2, 1951) is an American athlete who mainly competed in the 400 meters. Hammond was born in Sacramento, California. She was a child prodigy, winning the National Indoor Cha...
6th round of the 2021 Formula One season 2021 Azerbaijan Grand Prix Race 6 of 22[a] in the 2021 Formula One World Championship← Previous raceNext race → Layout of the Baku City CircuitRace details[4]Date 6 June 2021Official name Formula 1 Azerbaijan Grand Prix 2021Location Baku City CircuitBaku, AzerbaijanCourse Street circuitCourse length 6.003 km (3.730 miles)Distance 51 laps, 306.049 km (190.170 miles)Weather SunnyAttendance 0[b]Pole position...
2020 South Korean television series The Spies Who Loved MePromotional posterAlso known asThe Spy Who Loves MeHangul나를 사랑한 스파이Revised RomanizationNareul Saranghan SeupaiMcCune–ReischauerNarŭl Saranghan Sŭp'ai GenreActionRomanceComedyWritten byLee Ji-minDirected byLee Jae-jinStarring Eric Mun Yoo In-na Lim Ju-hwan Country of originSouth KoreaOriginal languageKoreanNo. of episodes16ProductionProducerHwang Ji-wooRunning time70 minutesProduction companyStory & Pictures Medi...
Una ecuación de compatibilidad es una ecuación adicional a un problema mecánico de equilibrio necesaria para asegurar que la solución buscada es compatible con las condiciones de contorno o para poder asegurar la integrabilidad del campo de deformaciones. Ecuaciones de compatibilidad en deformaciones En el planteamiento del problema elástico, las ecuaciones de compatibilidad son ecuaciones que si se cumplen garantizan la existencia de un campo de desplazamientos compatible con las deform...
Glee logo Glee is an American musical comedy-drama television series that aired on Fox. It was created by Ryan Murphy, Brad Falchuk, and Ian Brennan. The pilot episode of the show was broadcast on May 19, 2009,[1] and the rest of the season began on September 9, 2009.[2] Fox initially ordered thirteen episodes of Glee, picking the show up for a full season on September 21, 2009,[3] ordering nine more episodes.[4] The remainder of the first season aired for nine...
Hotel in London For other uses, see Connacht (disambiguation). This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Connaught hotel – news · newspapers · books · scholar · JSTOR (November 2011) (Learn how and when to remove this template message) The ConnaughtLocation within Central LondonGeneral informa...
Crystal of KnowledgeUniversitas IndonesiaCrystal of Knowledge UI, Desember 2012Nama lainPerpustakaan Pusat Universitas IndonesiaInformasi umumStatusSelesaiJenisPerpustakaanLokasiDepok, Jawa Barat, IndonesiaRampung2009Biaya170 miliar RupiahData teknisJumlah lantai8Luas lantai33.000 m2 (360.000 sq ft)Desain dan konstruksiArsitekBudiman HendropurnomoPengembangDMC ArchitectKontraktor utamaArkoninSitus webwww.lib.ui.ac.id Kristal Pengetahuan (Crystal of Knowledge) merupakan perpusta...
Type of nerve Deep fibular (peroneal) nerveNerves of the right lower extremity, posterior view.DetailsFromcommon peroneal nerveInnervatesanterior compartment of legIdentifiersLatinNervus fibularis profundus, nervus peroneus profundusTA98A14.2.07.055TA26579FMA44771Anatomical terms of neuroanatomy[edit on Wikidata] The deep fibular nerve (also known as deep peroneal nerve) begins at the bifurcation of the common fibular nerve between the fibula and upper part of the fibularis longus, passes...
2012 Speedway Grand Prix of Sweden2012 FIM Swedish SGPInformationDate26 May 2012City GothenburgEvent4 of 12 (148)Referee Craig AckroydJury President Armando CastagnaStadium detailsStadiumUlleviCapacity43,000Length404 m (442 yd)Tracktemporary (athletics)SGP ResultsAttendance11,000Best Time Antonio Lindbäck67,1 secs (in Heat 2) Winner Fredrik Lindgren Runner-up Greg Hancock 3rd place Greg Hancock The 2012 FIM Swedish Speedway Grand Prix was the fourth race of the 2012 Speedway G...
Weekly newspaper in San Jose, CA, US This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (October 2019) Willow Glen ResidentTypeWeekly newspaperFormatBroadsheetOwner(s)Digital First MediaPublisherJoe Guerra, IIIEditorMario DiandaFounded1953; 70 years ago (1953)LanguageEnglishHeadquartersSan Jose, CaliforniaCirculation20,150OCLC number39758178 Websitewww.mercurynews.com/...
For the aircraft, see Saro Lerwick. For the Norwegian town with the same name as the Norn name of Lerwick, see Larvik. Human settlement in ScotlandLerwickView of Lerwick from above near the Town HallLerwickLocation within ShetlandArea3.15 km2 (1.22 sq mi) [1]Population6,760 (mid-2020 est.)[2]• Density2,146/km2 (5,560/sq mi)OS grid referenceHU474414• Edinburgh300 miles (480 km)• London600 miles (970 km)...
Henry Percy Henry Percy, detto Hotspur / Testacalda (speron di fuoco o Testacalda) (Northumberland, 20 maggio 1364 – Shrewsbury, 21 luglio 1403), è stato un cavaliere medievale inglese che si ribellò contro Enrico IV. Indice 1 Biografia 1.1 Ribellione e morte 2 Onorificenze 3 Matrimonio e discendenti 4 Nella letteratura 5 Ascendenza 6 Note 7 Altri progetti 8 Collegamenti esterni Biografia Era il figlio maggiore di Henry Percy, I conte di Northumberland e signore di Alnwick, e di Marga...
American businessman and confidence swindler Billie Sol EstesBorn(1925-01-10)January 10, 1925Near Clyde, Texas, U.S.DiedMay 14, 2013(2013-05-14) (aged 88)DeCordova, Texas, U.S.NationalityAmericanOccupationBusinessmanKnown for Fraud Connection to Lyndon Johnson Criminal charges Swindling Fraud Interstate transportation of securities taken by fraud Conspiracy Mail fraud Criminal penalty 8 years in prison for swindling, reversed 15 years for mail fraud and conspiracy, served 7 years in...
Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!