Kernel methods for vector output

Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.

In typical machine learning algorithms, these functions produce a scalar output. Recent development of kernel methods for functions with vector-valued output is due, at least in part, to interest in simultaneously solving related problems. Kernels which capture the relationship between the problems allow them to borrow strength from each other. Algorithms of this type include multi-task learning (also called multi-output learning or vector-valued learning), transfer learning, and co-kriging. Multi-label classification can be interpreted as mapping inputs to (binary) coding vectors with length equal to the number of classes.

In Gaussian processes, kernels are called covariance functions. Multiple-output functions correspond to considering multiple processes. See Bayesian interpretation of regularization for the connection between the two perspectives.

History

The history of learning vector-valued functions is closely linked to transfer learning- storing knowledge gained while solving one problem and applying it to a different but related problem. The fundamental motivation for transfer learning in the field of machine learning was discussed in a NIPS-95 workshop on “Learning to Learn”, which focused on the need for lifelong machine learning methods that retain and reuse previously learned knowledge. Research on transfer learning has attracted much attention since 1995 in different names: learning to learn, lifelong learning, knowledge transfer, inductive transfer, multitask learning, knowledge consolidation, context-sensitive learning, knowledge-based inductive bias, metalearning, and incremental/cumulative learning.[1] Interest in learning vector-valued functions was particularly sparked by multitask learning, a framework which tries to learn multiple, possibly different tasks simultaneously.

Much of the initial research in multitask learning in the machine learning community was algorithmic in nature, and applied to methods such as neural networks, decision trees and k-nearest neighbors in the 1990s.[2] The use of probabilistic models and Gaussian processes was pioneered and largely developed in the context of geostatistics, where prediction over vector-valued output data is known as cokriging.[3][4][5] Geostatistical approaches to multivariate modeling are mostly formulated around the linear model of coregionalization (LMC), a generative approach for developing valid covariance functions that has been used for multivariate regression and in statistics for computer emulation of expensive multivariate computer codes. The regularization and kernel theory literature for vector-valued functions followed in the 2000s.[6][7] While the Bayesian and regularization perspectives were developed independently, they are in fact closely related.[8]

Notation

In this context, the supervised learning problem is to learn the function which best predicts vector-valued outputs given inputs (data) .

for
, an input space (e.g. )

In general, each component of (), could have different input data () with different cardinality () and even different input spaces ().[8] Geostatistics literature calls this case heterotopic, and uses isotopic to indicate that the each component of the output vector has the same set of inputs.[9]

Here, for simplicity in the notation, we assume the number and sample space of the data for each output are the same.

Regularization perspective[8][10][11]

From the regularization perspective, the problem is to learn belonging to a reproducing kernel Hilbert space of vector-valued functions (). This is similar to the scalar case of Tikhonov regularization, with some extra care in the notation.

Vector-valued case Scalar case
Reproducing kernel
Learning problem
Solution

(derived via the representer theorem)

with ,
where are the coefficients and output vectors concatenated to form vectors and matrix of blocks:

Solve for by taking the derivative of the learning problem, setting it equal to zero, and substituting in the above expression for :

where

It is possible, though non-trivial, to show that a representer theorem also holds for Tikhonov regularization in the vector-valued setting.[8]

Note, the matrix-valued kernel can also be defined by a scalar kernel on the space . An isometry exists between the Hilbert spaces associated with these two kernels:

Gaussian process perspective

The estimator of the vector-valued regularization framework can also be derived from a Bayesian viewpoint using Gaussian process methods in the case of a finite dimensional Reproducing kernel Hilbert space. The derivation is similar to the scalar-valued case Bayesian interpretation of regularization. The vector-valued function , consisting of outputs , is assumed to follow a Gaussian process:

where is now a vector of the mean functions for the outputs and is a positive definite matrix-valued function with entry corresponding to the covariance between the outputs and .

For a set of inputs , the prior distribution over the vector is given by , where is a vector that concatenates the mean vectors associated to the outputs and is a block-partitioned matrix. The distribution of the outputs is taken to be Gaussian:

where is a diagonal matrix with elements specifying the noise for each output. Using this form for the likelihood, the predictive distribution for a new vector is:

where is the training data, and is a set of hyperparameters for and .

Equations for and can then be obtained:

where has entries for and . Note that the predictor is identical to the predictor derived in the regularization framework. For non-Gaussian likelihoods different methods such as Laplace approximation and variational methods are needed to approximate the estimators.

Example kernels

Separable

A simple, but broadly applicable, class of multi-output kernels can be separated into the product of a kernel on the input-space and a kernel representing the correlations among the outputs:[8]

: scalar kernel on
: scalar kernel on

In matrix form:    where is a symmetric and positive semi-definite matrix. Note, setting to the identity matrix treats the outputs as unrelated and is equivalent to solving the scalar-output problems separately.

For a slightly more general form, adding several of these kernels yields sum of separable kernels (SoS kernels).

From regularization literature[8][10][12][13][14]

Derived from regularizer

One way of obtaining is to specify a regularizer which limits the complexity of in a desirable way, and then derive the corresponding kernel. For certain regularizers, this kernel will turn out to be separable.

Mixed-effect regularizer

where:

where matrix with all entries equal to 1.

This regularizer is a combination of limiting the complexity of each component of the estimator () and forcing each component of the estimator to be close to the mean of all the components. Setting treats all the components as independent and is the same as solving the scalar problems separately. Setting assumes all the components are explained by the same function.

Cluster-based regularizer

where:

  • is the index set of components that belong to cluster
  • is the cardinality of cluster
  • if and both belong to cluster  ( otherwise

where

This regularizer divides the components into clusters and forces the components in each cluster to be similar.

Graph regularizer

where matrix of weights encoding the similarities between the components

where ,  

Note, is the graph laplacian. See also: graph kernel.

Learned from data

Several approaches to learning from data have been proposed.[8] These include: performing a preliminary inference step to estimate from the training data,[9] a proposal to learn and together based on the cluster regularizer,[15] and sparsity-based approaches which assume only a few of the features are needed.[16] [17]

From Bayesian literature

Linear model of coregionalization (LMC)

In LMC, outputs are expressed as linear combinations of independent random functions such that the resulting covariance function (over all inputs and outputs) is a valid positive semidefinite function. Assuming outputs with , each is expressed as:

where are scalar coefficients and the independent functions have zero mean and covariance cov if and 0 otherwise. The cross covariance between any two functions and can then be written as:

where the functions , with and have zero mean and covariance cov if and . But is given by . Thus the kernel can now be expressed as

where each is known as a coregionalization matrix. Therefore, the kernel derived from LMC is a sum of the products of two covariance functions, one that models the dependence between the outputs, independently of the input vector (the coregionalization matrix ), and one that models the input dependence, independently of (the covariance function ).

Intrinsic coregionalization model (ICM)

The ICM is a simplified version of the LMC, with . ICM assumes that the elements of the coregionalization matrix can be written as , for some suitable coefficients . With this form for :

where

In this case, the coefficients

and the kernel matrix for multiple outputs becomes . ICM is much more restrictive than the LMC since it assumes that each basic covariance contributes equally to the construction of the autocovariances and cross covariances for the outputs. However, the computations required for the inference are greatly simplified.

Semiparametric latent factor model (SLFM)

Another simplified version of the LMC is the semiparametric latent factor model (SLFM), which corresponds to setting (instead of as in ICM). Thus each latent function has its own covariance.

Non-separable

While simple, the structure of separable kernels can be too limiting for some problems.

Notable examples of non-separable kernels in the regularization literature include:

In the Bayesian perspective, LMC produces a separable kernel because the output functions evaluated at a point only depend on the values of the latent functions at . A non-trivial way to mix the latent functions is by convolving a base process with a smoothing kernel. If the base process is a Gaussian process, the convolved process is Gaussian as well. We can therefore exploit convolutions to construct covariance functions.[20] This method of producing non-separable kernels is known as process convolution. Process convolutions were introduced for multiple outputs in the machine learning community as "dependent Gaussian processes".[21]

Implementation

When implementing an algorithm using any of the kernels above, practical considerations of tuning the parameters and ensuring reasonable computation time must be considered.

Regularization perspective

Approached from the regularization perspective, parameter tuning is similar to the scalar-valued case and can generally be accomplished with cross validation. Solving the required linear system is typically expensive in memory and time. If the kernel is separable, a coordinate transform can convert to a block-diagonal matrix, greatly reducing the computational burden by solving D independent subproblems (plus the eigendecomposition of ). In particular, for a least squares loss function (Tikhonov regularization), there exists a closed form solution for :[8][14]

Bayesian perspective

There are many works related to parameter estimation for Gaussian processes. Some methods such as maximization of the marginal likelihood (also known as evidence approximation, type II maximum likelihood, empirical Bayes), and least squares give point estimates of the parameter vector . There are also works employing a full Bayesian inference by assigning priors to and computing the posterior distribution through a sampling procedure. For non-Gaussian likelihoods, there is no closed form solution for the posterior distribution or for the marginal likelihood. However, the marginal likelihood can be approximated under a Laplace, variational Bayes or expectation propagation (EP) approximation frameworks for multiple output classification and used to find estimates for the hyperparameters.

The main computational problem in the Bayesian viewpoint is the same as the one appearing in regularization theory of inverting the matrix

This step is necessary for computing the marginal likelihood and the predictive distribution. For most proposed approximation methods to reduce computation, the computational efficiency gained is independent of the particular method employed (e.g. LMC, process convolution) used to compute the multi-output covariance matrix. A summary of different methods for reducing computational complexity in multi-output Gaussian processes is presented in.[8]

References

  1. ^ S.J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on Knowledge and Data Engineering, 22, 2010
  2. ^ Rich Caruana, "Multitask Learning," Machine Learning, 41–76, 1997
  3. ^ J. Ver Hoef and R. Barry, "Constructing and fitting models for cokriging and multivariable spatial prediction[dead link]," Journal of Statistical Planning and Inference, 69:275–294, 1998
  4. ^ P. Goovaerts, "Geostatistics for Natural Resources Evaluation," Oxford University Press, USA, 1997
  5. ^ N. Cressie "Statistics for Spatial Data," John Wiley & Sons Inc. (Revised Edition), USA, 1993
  6. ^ C.A. Micchelli and M. Pontil, "On learning vector-valued functions," Neural Computation, 17:177–204, 2005
  7. ^ C. Carmeli et al., "Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem," Anal. Appl. (Singap.), 4
  8. ^ a b c d e f g h i j k Mauricio A. Álvarez, Lorenzo Rosasco, and Neil D. Lawrence, "Kernels for Vector-Valued Functions: A Review," Foundations and Trends in Machine Learning 4, no. 3 (2012): 195–266. doi: 10.1561/2200000036 arXiv:1106.6251
  9. ^ a b Hans Wackernagel. Multivariate Geostatistics. Springer-Verlag Heidelberg New york, 2003.
  10. ^ a b C.A. Micchelli and M. Pontil. On learning vector–valued functions. Neural Computation, 17:177–204, 2005.
  11. ^ C.Carmeli, E.DeVito, and A.Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Anal. Appl. (Singap.), 4(4):377–408, 2006.
  12. ^ C. A. Micchelli and M. Pontil. Kernels for multi-task learning. In Advances in Neural Information Processing Systems (NIPS). MIT Press, 2004.
  13. ^ T.Evgeniou, C.A.Micchelli, and M.Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005.
  14. ^ a b L. Baldassarre, L. Rosasco, A. Barla, and A. Verri. Multi-output learning via spectral filtering. Technical report, Massachusetts Institute of Technology, 2011. MIT-CSAIL-TR-2011-004, CBCL-296.
  15. ^ Laurent Jacob, Francis Bach, and Jean-Philippe Vert. Clustered multi-task learning: A convex formulation. In NIPS 21, pages 745–752, 2008.
  16. ^ Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
  17. ^ Andreas Argyriou, Andreas Maurer, and Massimiliano Pontil. An algorithm for transfer learning in a heterogeneous environment. In ECML/PKDD (1), pages 71–85, 2008.
  18. ^ I. Maceˆdo and R. Castro. Learning divergence-free and curl-free vector fields with matrix-valued kernels. Technical report, Instituto Nacional de Matematica Pura e Aplicada, 2008.
  19. ^ A. Caponnetto, C.A. Micchelli, M. Pontil, and Y. Ying. Universal kernels for multi-task learning. Journal of Machine Learning Research, 9:1615–1646, 2008.
  20. ^ D. Higdon, "Space and space-time modeling using process convolutions, Quantitative methods for current environmental issues, 37–56, 2002
  21. ^ P. Boyle and M. Frean, "Dependent gaussian processes, Advances in Neural Information Processing Systems, 17:217–224, MIT Press, 2005

Read other articles:

В Википедии есть статьи о других людях с именем Мария Саксонская. Мария Жозефа СаксонскаяMaria Josepha von Sachsen Портрет кисти испанского художника Висенте Лопеса Портаньи королева-консорт Испании 20 октября 1819 — 18 мая 1829 Предшественник Мария Изабелла Португальская Преемник ...

 

Rikuzen-Takata陸前高田Stasiun Rikuzen-Takata pada Desember 2018LokasiTakata-cho Naruishi 42-5, Rikuzentakata-shi, Iwate-ken 029-2205JepangKoordinat39°00′46″N 141°37′33″E / 39.012737°N 141.625889°E / 39.012737; 141.625889Pengelola JR EastJalur■ Jalur ŌfunatoLetak dari pangkal85.4 km dari IchinosekiJumlah peron2 peron sampingKonstruksiJenis strukturAtas tanahInformasi lainStatusMemiliki stafSitus webSitus web resmiSejarahDibuka15 Desember 1933Ditutup11...

 

The Park Inn pada 2006, setelah selesainya façade barunya Hotel saat direnovasi, dengan façade 1970 aslinya Park Inn Berlin-Alexanderplatz merupakan bangunan bertingkat tertinggi di Berlin. Hotel bertingkat 41 ini terletak di Alexanderplatz dan memiliki tinggi 132 meter. Dibangun dengan nama Hotel Stadt Berlin', bagian dari organisasi Interhotel Jerman Timur, dari 1967 hingga 1970. Pada tingkat paling atas, terdapat sebuah kasino dan restoran. Pada 1993, setelah penyatuan kembali Jerman, na...

Santa Claus distributes gifts to Union troops in Nast's first Santa Claus cartoon, (1863) Christmas in the American Civil War (1861–1865) was celebrated in the Confederate States of America (the South), but frowned upon and actually fined in Massachusetts. It was seen as an unnecessary expense. It was thought to be a day of fasting by the Puritans and Lutherans. The day did not become an official holiday until five years after the war ended. The war continued to rage on Christmas and skirmi...

 

Santa Rita Localidad de Yucatán Santa RitaLocalización de Santa Rita en México Santa RitaLocalización de Santa Rita en Yucatán Mapa interactivoCoordenadas 20°51′39″N 88°08′48″O / 20.860833333333, -88.146666666667Entidad Localidad de Yucatán • País México México • Estado Yucatán • Municipio Municipio de TemozónPresidente municipal Carlos Kuyoc Castillo (2021-2024)[1]​Población (2010)   • Total 458 hab.Huso h...

 

Sara Berasaluce Información personalNombre de nacimiento Sara Berasaluce Duque Nacimiento 1992 Vitoria (España) Nacionalidad EspañolaEducaciónEducada en Universidad del País VascoUniversidad Complutense de Madrid Información profesionalOcupación Artista, fotógrafa y profesor de arte Sitio web www.saraberasaluce.com [editar datos en Wikidata]Sara Berasaluce Duque (Vitoria, 1992) es una artista-fotógrafa y comisaria alavesa. Biografía Sara Berasaluce Duque nace en el año 19...

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Postage stamps and postal history of Canada – news · newspapers · books · scholar · JSTOR (January 2012) (L...

 

Une viola caipira en exposition. La viole caipira (du portugais : viola caipira ; ou encore viola sertaneja ou viola cabocla) ou guitare caipira est un instrument de musique à cordes pincées, et l'une des variantes régionales de la viole brésilienne (pt). Il est populaire principalement à l'intérieur du Brésil et est l'un des symboles de la musique populaire brésilienne et surtout de la musique sertaneja indigène. Histoire La viole caipira tient ses origines des violes...

 

Jack WardenWarden pada tahun 1950LahirJohn Warden Lebzelter Jr.(1920-09-18)18 September 1920Newark, New Jersey, Amerika SerikatMeninggal19 Juli 2006(2006-07-19) (umur 85)New York City, New York, Amerika SerikatNama lainJohnny CostelloPekerjaanAktorTahun aktif1948–2000Suami/istriVanda Dupre ​(m. 1958)​Anak1 Jack Warden (18 September 1920-19 Juli 2006) merupakan seorang aktor berkebangsaan Amerika Serikat yang memenangkan Emmw Award. Dilahirkan deng...

Aliyatin MahmudiWakil Komandan Rindam XVI/Pattimura Informasi pribadiLahir16 Mei 1976 (umur 47)Rembang, Jawa TengahSuami/istriNy. WahyuningsihAnak1. Diah Ayu Fatimah2. Prasetyowati Alini Putri F3. Irvan Nur RamadhanAlma materAkademi Militer (1997)Karier militerPihak IndonesiaDinas/cabang TNI Angkatan DaratMasa dinas1997—sekarangPangkat KolonelSatuanInfanteriSunting kotak info • L • B Kolonel Inf. Aliyatin Mahmudi, S.I.P., M.M. (lahir 16 Mei 1976) adalah seorang ...

 

This timeline of Apple products is a list of all computers, phones, tablets, wearables, and other products sold by Apple Inc. This list is ordered by the release date of the products. Macintosh Performa models were often physically identical to other models, in which case they are omitted in favor of the identical twin. Timeline of Apple Inc. products vte See also: Timeline of the Apple II series and List of Mac models Products on this timeline indicate introduction dates only and not necessa...

 

SV Berliner Brauereien Voller Name SV Berliner Brauereien e.V. Ort Berlin-Prenzlauer Berg Gegründet 1949 Aufgelöst 2016 Vereinsfarben gelb-grün Stadion Sportplatz Paul-Heyse-Straße (Prenzlauer Berg) Höchste Liga Bezirksklasse Berlin Erfolge keine Heim Auswärts Der SV Berliner Brauereien war ein deutscher Sportverein aus Berlin. Der Club stand in der Tradition der BSG Empor Nordost sowie der BSG Berliner Brauereien. Inhaltsverzeichnis 1 Sektion Fußball 2 Statistik 3 Personen 4 Quellen 5...

2014 American supernatural television series The LeftoversIntertitle for seasons 2–3Genre Drama Supernatural fiction Mystery Magical realism Psychological thriller Philosophical fiction Created by Damon Lindelof Tom Perrotta Based onThe Leftoversby Tom PerrottaStarring Justin Theroux Amy Brenneman Christopher Eccleston Liv Tyler Chris Zylka Margaret Qualley Carrie Coon Emily Meade Amanda Warren Ann Dowd Michael Gaston Max Carver Charlie Carver Annie Q. Janel Moloney Regina King Kevin Carrol...

 

2022 novel by Mary Robinette Kowal The Spare Man AuthorMary Robinette KowalLanguageEnglishGenreScience fiction; MysteryPublisherTor BooksPages368ISBN978-1-250-82917-7 The Spare Man is a 2022 science fiction murder mystery novel by Mary Robinette Kowal. The novel was nominated for the 2023 Hugo Award for Best Novel and the 2023 Locus Award for Best Science Fiction Novel. Plot Tesla Crane is a wealthy heiress and retired robotics engineer. She suffers from chronic pain and PTSD after a robotics...

 

Shopping mall in New South Wales, AustraliaWestfield ChatswoodWestfield Chatswood from Victoria AvenueLocationChatswood, New South Wales, AustraliaCoordinates33°47′49″S 151°11′03″E / 33.796893°S 151.184111°E / -33.796893; 151.184111Address1 Anderson St, Chatswood NSW 2067Opening date30 January 1986; 37 years ago (30 January 1986)ManagementScentre GroupOwnerScentre GroupNo. of stores and services257No. of anchor tenants5Total retail floor area81...

Roman Catholic bishop of Haarlem-Amsterdam from 2001 to 2020 His ExcellencyJozef Marianus PuntBishop Emertius of Haarlem-AmsterdamApostolic Administrator Emeritus of the Military Ordinariate of the NetherlandsBishop Punt in 2008DioceseHaarlem-AmsterdamSeeCathedral of Saint BavoAppointed21 July 2001Term ended1 June 2020PredecessorHendrik Joseph Alois BomersSuccessorJohannes Willibrordus Maria HendriksOrdersRankBishopPersonal detailsBornJoseph Marianus Punt (1946-01-10) 10 January 1946 (age...

 

クィントゥス・アエリウス・パエトゥスQ. Ailius P. f. Q. n. Paetus出生 不明死没 不明出身階級 プレブス氏族 アエリウス氏族官職 護民官(紀元前177年)法務官(紀元前170年)執政官(紀元前167年)指揮した戦争 対リグリア戦争テンプレートを表示 クィントゥス・アエリウス・パエトゥス(Quintus Aelius Paetus、生没年不詳)は、紀元前2世紀初頭の共和政ローマの政治家・軍人...

 

2014 Indian filmYaanTheatrical PosterDirected byRavi K. ChandranWritten byRavi K. ChandranProduced byElred KumarJayaramanStarringJiivaThulasi NairNassarCinematographyManush NandanEdited byA. Sreekar PrasadMusic byHarris JayarajProductioncompanyR. S. InfotainmentDistributed byDream FactoryRelease date 2 October 2014 (2014-10-02) Running time156 minutesCountryIndiaLanguageTamil Yaan (transl. Me) is a 2014 Indian Tamil-language action thriller film written, directed and film...

Geologic depositional and structural basin in West Texas and southern New Mexico This article is about the region in New Mexico. For other uses, see Delaware Basin (disambiguation). Exposed and buried parts of Capitan Reef. Blue area shows area once flooded by the Delaware Sea. The Delaware Basin is a geologic depositional and structural basin in West Texas and southern New Mexico, famous for holding large oil fields and for a fossilized reef exposed at the surface. Guadalupe Mountains Nation...

 

South African type EW tenderType EW tender on Class 23, 8 January 2010Type and origin♠ Tender numbers 2552-2271♥ Tender numbers 3201-3316LocomotiveClass 23DesignerSouth African Railways(W.A.J. Day)BuilderBerliner MaschinenbauHenschel and SonIn service1938-1939RebuilderSouth African RailwaysRebuilt toWater only tenderSpecificationsConfiguration3-axle bogiesGauge3 ft 6 in (1,067 mm) Cape gaugeLength42 ft 9+3⁄4 in (13,049 mm)Wheel dia.34 in (864&...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!