In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.
Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.
Background
The classical measure of dependence, the Pearson correlation coefficient,[1] is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009.[2][3] It was proved that distance covariance is the same as the Brownian covariance.[3] These measures are examples of energy distances.
The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.
Definitions
Distance covariance
Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical sample from a pair of real valued or vector valued random variables (X, Y). First, compute the n by ndistance matrices (aj, k) and (bj, k) containing all pairwise distances
where ||⋅ ||denotes Euclidean norm. Then take all doubly centered distances
where is the j-th row mean, is the k-th column mean, and is the grand mean of the distance matrix of the X sample. The notation is similar for the b values. (In the matrices of centered distances (Aj, k) and (Bj,k) all rows and all columns sum to zero.) The squared sample distance covariance (a scalar) is simply the arithmetic average of the products Aj, k Bj, k:
The statistic Tn = n dCov2n(X, Y) determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see dcov.test function in the energy package for R.[4]
The population value of distance covariance can be defined along the same lines. Let X be a random variable that takes values in a p-dimensional Euclidean space with probability distribution μ and let Y be a random variable that takes values in a q-dimensional Euclidean space with probability distribution ν, and suppose that X and Y have finite expectations. Write
Finally, define the population value of squared distance covariance of X and Y as
One can show that this is equivalent to the following definition:
where E denotes expected value, and and are independent and identically distributed. The primed random variables and denote
independent and identically distributed (iid) copies of the variables and and are similarly iid.[5] Distance covariance can be expressed in terms of the classical Pearson's covariance,
cov, as follows:
This identity shows that the distance covariance is not the same as the covariance of distances, cov(‖X − X' ‖, ‖Y − Y' ‖). This can be zero even if X and Y are not independent.
Alternatively, the distance covariance can be defined as the weighted L2 norm of the distance between the joint characteristic function of the random variables and the product of their marginal characteristic functions:[6]
where , , and are the characteristic functions of (X, Y),X, and Y, respectively, p, q denote the Euclidean dimension of X and Y, and thus of s and t, and cp, cq are constants. The weight function is chosen to produce a scale equivariant and rotation invariant measure that doesn't go to zero for dependent variables.[6][7] One interpretation of the characteristic function definition is that the variables eisX and eitY are cyclic representations of X and Y with different periods given by s and t, and the expression ϕX, Y(s, t) − ϕX(s) ϕY(t) in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of eisX and eitY. The characteristic function definition clearly shows that
dCov2(X, Y) = 0 if and only if X and Y are independent.
Distance variance and distance standard deviation
The distance variance is a special case of distance covariance when the two variables are identical. The population value of distance variance is the square root of
The sample distance variance is the square root of
which is a relative of Corrado Gini's mean difference introduced in 1912 (but Gini did not work with centered distances).[8]
The distance standard deviation is the square root of the distance variance.
Distance correlation
The distance correlation[2][3] of two random variables is obtained by dividing their distance covariance by the product of their distance standard deviations. The distance correlation is the square root of
and the sample distance correlation is defined by substituting the sample distance covariance and distance variances for the population coefficients above.
For easy computation of sample distance correlation see the dcor function in the energy package for R.[4]
Properties
Distance correlation
and ;
this is in contrast to Pearson's correlation, which can be negative.
if and only if X and Y are independent.
implies that dimensions of the linear subspaces spanned by X and Y samples respectively are almost surely equal and if we assume that these subspaces are equal, then in this subspace for some vector A, scalar b, and orthonormal matrix.
Distance covariance
and ;
for all constant vectors , scalars , and orthonormal matrices .
If the random vectors and are independent then
Equality holds if and only if and are both constants, or and are both constants, or are mutually independent.
if and only if X and Y are independent.
This last property is the most important effect of working with centered distances.
The statistic is a biased estimator of . Under independence of X and Y [9]
An unbiased estimator of is given by Székely and Rizzo.[10]
Distance variance
if and only if almost surely.
if and only if every sample observation is identical.
for all constant vectors A, scalars b, and orthonormal matrices .
If X and Y are independent then .
Equality holds in (iv) if and only if one of the random variables X or Y is a constant.
Generalization
Distance covariance can be generalized to include powers of Euclidean distance. Define
Then for every , and are independent if and only if . It is important to note that this characterization does not hold for exponent ; in this case for bivariate , is a deterministic function of the Pearson correlation.[2] If and are powers of the corresponding distances, , then sample distance covariance can be defined as the nonnegative number for which
One can extend to metric-space-valued random variables and : If has law in a metric space with metric , then define , , and (provided is finite, i.e., has finite first moment), . Then if has law (in a possibly different metric space with finite first moment), define
This is non-negative for all such iff both metric spaces have negative type.[11] Here, a metric space has negative type if is isometric to a subset of a Hilbert space.[12] If both metric spaces have strong negative type, then iff are independent.[11]
Alternative definition of distance covariance
The original distance covariance has been defined as the square root of , rather than the squared coefficient itself. has the property that it is the energy distance between the joint distribution of and the product of its marginals. Under this definition, however, the distance variance, rather than the distance standard deviation, is measured in the same units as the distances.
Alternately, one could define distance covariance to be the square of the energy distance:
In this case, the distance standard deviation of is measured in the same units as distance, and there exists an unbiased estimator for the population distance covariance.[10]
Under these alternate definitions, the distance correlation is also defined as the square , rather than the square root.
Alternative formulation: Brownian covariance
Brownian covariance is motivated by generalization of the notion of covariance to stochastic processes. The square of the covariance of random variables X and Y can be written in the following form:
where E denotes the expected value and the prime denotes independent and identically distributed copies. We need the following generalization of this formula. If U(s), V(t) are arbitrary random processes defined for all real s and t then define the U-centered version of X by
whenever the subtracted conditional expected value exists and denote by YV the V-centered version of Y.[3][13][14] The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is
whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-sided independent Brownian motions /Wiener processes with expectation zero and covariance |s| + |t| − |s − t| = 2 min(s,t) (for nonnegative s, t only). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (U,V) covariance is called Brownian covariance and is denoted by
There is a surprising coincidence: The Brownian covariance is the same as the distance covariance:
and thus Brownian correlation is the same as distance correlation.
On the other hand, if we replace the Brownian motion with the deterministic identity function id then Covid(X,Y) is simply the absolute value of the classical Pearson covariance,
Related metrics
Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as canonical correlation analysis and independent component analysis to yield stronger statistical power.
Armed conflict in Southeast Asia from 1948 to 1989 Communist insurgency in BurmaPart of the Cold War and the internal conflict in BurmaA delegation from the Communist Party of Burma returning by foot to their bases in the countryside after the failed 1963 peace talks.Date2 April 1948 – 16 April 1989(41 years and 2 weeks)LocationBurma (Myanmar)Result Burmese government victoryBelligerents Union of Burma (1948–1962) Socialist Republic of the Union of Burma (1962–1988) Union of B...
جائزة إيطاليا الكبرى 1963 (بالإنجليزية: XXXIV Italian Grand Prix) السباق 7 من أصل 10 في بطولة العالم لسباقات الفورمولا واحد موسم 1963 السلسلة بطولة العالم لسباقات فورمولا 1 موسم 1963 البلد إيطاليا التاريخ 8 سبتمبر 1963 مكان التنظيم حلبة مونزا، إيطاليا طول المسار 5.750 كيلومتر (3.573 مي...
Campaign by the Roman Empire against the Parthian Empire Parthian war of CaracallaPart of Roman–Parthian WarsMap of the ancient Near East, showing the respective locations of Armenia, Osroene, Adiabene and the Parthian EmpireDate216 AD - 217 ADLocationWestern AsiaResult Parthian victory[1][2] Rome is forced to pay tribute to ParthiaTerritorialchanges Status quo ante bellumBelligerents Roman Empire Parthian EmpireCommanders and leaders CaracallaMacrinus Artabanus IVCasualties...
La Lorraine compte 77 sites naturels classés et 53 sites naturels inscrits[1], répartis comme suit dans les quatre départements : 17 sites classés et 20 sites inscrits en Meurthe-et-Moselle Article détaillé : Liste des sites classés et inscrits de Meurthe-et-Moselle. 23 sites classés et 4 sites inscrits dans la Meuse Article détaillé : Liste des sites classés et inscrits de la Meuse. 15 sites classés et 11 sites inscrits en Moselle Article détaillé : Liste de...
Fourth season of the Sailor Moon anime series Super S redirects here. For the graffiti signature, see Cool S. Season of television series Sailor Moon SuperSSeason 4The anime series title, which originally translated to Pretty Soldier Sailor Moon SuperS, and later as Pretty Guardian Sailor Moon SuperS.Country of originJapanNo. of episodes39ReleaseOriginal networkTV AsahiOriginal releaseMarch 4, 1995 (1995-03-04) –March 2, 1996 (1996-03-02)Season chronology← PreviousSail...
Abraham RobinsonLahir(1918-10-06)6 Oktober 1918Waldenburg, Kekaisaran JermanMeninggal11 April 1974(1974-04-11) (umur 55)New Haven, Connecticut, Amerika SerikatAlmamaterUniversitas Ibrani, Universitas LondonDikenal atasAnalisis non-standarKarier ilmiahBidangMatematikaInstitusiUniversitas California, Los Angeles, Universitas Yale, Universitas TorontoPembimbing doktoralPaul DienesMahasiswa doktoralE. Mark GoldAzriel LévyA. H. LightstonePeter WinklerCarol S. WoodTerinspirasiGottfried Leibni...
Badan Riset dan Inovasi Nasional(BRIN)Gambaran umumDidirikan28 April 2021 (2021-04-28)Dasar hukumPeraturan Presiden Nomor 78 Tahun 2021Nomenklatur sebelumnya Kementerian Riset dan Teknologi Republik Indonesia Badan Pengkajian dan Penerapan Teknologi Badan Tenaga Nuklir Nasional Lembaga Ilmu Pengetahuan Indonesia Lembaga Penerbangan dan Antariksa Nasional Bidang tugasPenelitian, pengembangan, pengkajian, penerapan, invensi, inovasi, ketenaganukliran, dan keantariksaanKepalaDr. Laksana Tri...
2010 video gameDoritos Crash CourseDeveloper(s)Wanako GamesBehaviour InteractivePublisher(s)Microsoft Game StudiosPlatform(s)Xbox 360 (XBLA), Xbox One, WindowsReleaseDecember 8, 2010Genre(s)PlatformMode(s)Single-player, multiplayer Doritos Crash Course (formerly titled as Avatar Crash Course) is a 3D sidescrolling platforming advergame developed by Wanako Games for the Xbox 360. It was released for free as one of the finalists of the Unlock Xbox competition sponsored by Doritos, alongside Har...
Trang hay phần này đang được viết mới, mở rộng hoặc đại tu. Bạn cũng có thể giúp xây dựng trang này. Nếu trang này không được sửa đổi gì trong vài ngày, bạn có thể gỡ bản mẫu này xuống.Nếu bạn là người đã đặt bản mẫu này, đang viết bài và không muốn bị mâu thuẫn sửa đổi với người khác, hãy treo bản mẫu {{đang sửa đổi}}. Sửa đổi cuối: InternetArchiveBot&...
Drukte bij Walmart. Black Friday (Nederlands: Zwarte Vrijdag) is ontstaan in Verenigde Staten en valt de dag na Thanksgiving Day, dat wordt gevierd op de vierde donderdag in november. Op vrijdag hebben de meeste werknemers in de Verenigde Staten vrijaf. Black Friday wordt beschouwd als het begin van het seizoen voor kerstaankopen. Er wordt met hoge kortingen geadverteerd om klanten te lokken. Heel wat winkels gaan al vroeg in de ochtend open en klanten staan vaak uren in de rij tot de winkels...
1926 film by Frank Capra For other uses, see Strongman (disambiguation). The Strong ManJumbo lobby cardDirected byFrank CapraWritten byArthur Ripley (story)Hal Conklin (adaptation)Robert Eddy (adaptation)Reed Heustis (titles)Produced byHarry LangdonStarringHarry LangdonPriscilla BonnerCinematographyGlenn KershnerElgin LessleyEdited byHarold YoungArthur RipleyMusic byCarl Davis (1985)(orchestration: Kevin Townend)ProductioncompanyHarry Langdon CorporationDistributed byFirst National PicturesRe...
American actress (1920–1980) Barbara BrittonBritton in 1953BornBarbara Maurine Brantingham(1920-09-26)September 26, 1920Long Beach, California, U.S.DiedJanuary 17, 1980(1980-01-17) (aged 59)Manhattan, New York, U.S.Resting placeWoodlawn Cemetery BronxOccupation(s)Actress, Mayor of Hollywood (1952)Years active1941–1980Spouse Dr. Eugene Czukor (m. 1945)Children3[1] Barbara Britton (born Barbara Maurine Brantingham; September 26, 1920 ...
NorledTypePublicIndustryTransportFounded2007HeadquartersBergen, NorwayArea servedNorwayServicesFerry transportWebsitewww.norled.no Norled (earlier Tide Sjø AS) is a Norwegian shipping company responsible for the group's ferry transport. Tide operates automobile ferries and fast ferries in Rogaland, Vestland, Sunnmøre and Trondheim Fjord on contract with the Norwegian Public Roads Administration, Kolumbus and Skyss.[1] The company operates 45 car ferries,[2] 17 fast ferries a...
Airport in BasseterreRobert L. Bradshaw AirportIATA: SKBICAO: TKPKSummaryAirport typePublicOperatorSt Christopher Air & Sea Ports AuthorityLocationBasseterreElevation AMSL170 ft / 52 mCoordinates17°18′41″N 062°43′07″W / 17.31139°N 62.71861°W / 17.31139; -62.71861MapSKBLocation on map of Saint Kitts and NevisRunways Direction Length Surface m ft 07/25 2,439 8,000 Asphalt Source: Robert L. Bradshaw Airport (IATA: SKB, ICAO: TKPK), formerly ...
2007 studio album by Ben JelenEx-SensitiveStudio album by Ben JelenReleasedJuly 17, 2007GenrePopLabelCustardProducerLinda PerryBen Jelen chronology Rejected (EP)(2005) Ex-Sensitive(2007) Professional ratingsReview scoresSourceRatingAllmusic link Ex-Sensitive is the second album by Scottish/American singer Ben Jelen and is his first release on Linda Perry's Custard Records. The first single, Where Do We Go, failed to chart in the US. Track listing Pulse Where Do We Go Ex-Sensitive Coun...
National Climatic Data Center (NCDC) di Asheville, North Carolina adalah arsip data cuaca terbesar di dunia. Pusat ini memiliki data dari 150 tahun dengan 224 gigabyte informasi ditambahkan setiap hari. NCDC mengarsipkan 99 persen dari semua data NOAA, termasuk lebih dari 320 juta catatan kertas; 2,5 juta mikrofilm; lebih dari 1,2 petabyte data digital. NCDC memiliki gambar cuaca satelit sejak 1960. Data diterima dari berbagai sumber, termasuk satelit, radar, NWS, pesawat terbang, kapal laut,...
1977 single by Iggy PopLust for LifeCover of the 1977 Netherlands singleSingle by Iggy Popfrom the album Lust for Life ReleasedSeptember 9, 1977 (1977-09-09)RecordedJune 1977[1]StudioHansa (West Berlin)[2]GenreGarage rock[3]jangle pop[4]Length5:12LabelRCASongwriter(s)Iggy PopDavid BowieProducer(s)David BowieMusic videoLust for Life on YouTube Lust for Life is a 1977 song performed by American singer Iggy Pop and co-written by David Bowie, feature...
El petróleo natural se filtra a la superficie como en la imagen en el área de McKittrick de California, que fue utilizado por los nativos americanos y posteriormente minadas por los colonos. Reproducción de Casa de máquinas y torre de perforación en Oil Creek, Pensilvania La historia de la industria petrolera en los Estados Unidos se remonta a principios del siglo XIX, aunque los pueblos indígenas al igual que muchas sociedades antiguas, han utilizado las filtraciones del petróleo...