PROFILBARU.COM

แบบจำลองตัวแปรแฝง (latent variable model) เป็นคำทั่วไปสำหรับเรียกแบบจำลองทางสถิติ ที่แสดงโดยการแจกแจงความน่าจะเป็นร่วมของตัวแปรที่สังเกตได้และตัวแปรแฝง^[1]

ภาพรวม

ในแบบจำลองตัวแปรแฝง ถ้าให้ตัวแปรที่สังเกตได้ $x$ มีการแจกแจงเป็น $p^{*}(x)$ และการแจกแจงความน่าจะเป็นร่วมของตัวแปรแฝง $z$ และพารามิเตอร์แบบจำลอง $\theta$ เป็น $p_{\theta }(x,z)$ ในกรณีนี้ เมื่อให้การแจกแจงความน่าจะเป็นร่วมเป็นการแจกแจงตามขอบ จะได้ความสัมพันธ์ว่า:

p_{\theta }(x)=\int p_{\theta }(x,z)dz

ตัวแปรแฝงสามารถแสดงได้เป็นฟังก์ชันภาวะน่าจะเป็นในรูปของการแจกแจงตามขอบ $p(x|\theta )$ โดยฟังก์ชันของ $\theta$ เรียกว่าเป็น ภาวะน่าจะเป็นตามขอบ (marginal likelihood) หรือ พยานแบบจำลอง (model evidence)^[2]

ตัวอย่างของแบบจำลอง ได้แก่ โครงข่ายแบบเบส์ที่มีตัวแปรแฝง กล่าวอีกนัยหนึ่ง ผลคูณของแบบจำลองความน่าจะเป็นแบบมีเงื่อนไขได้มาจากการแยกตัวประกอบของการแจกแจงร่วม $p_{\theta }(x,z)=p_{\theta }(x|z)p_{\theta }(z)$ ^[3] ในกรณีนี้ $p_{\theta }(z)$ มักเรียกกันว่า "การแจกแจงก่อนของ $z$ " (เนื่องจากยังไม่ได้มีการสร้างเงื่อนไขโดยค่าสังเกตการณ์)^[4]

มีความพยายามในการหาวิธีการที่เหมาะสมในการประมาณค่าพารามิเตอร์ของแบบจำลองตัวแปรแฝง ตัวแปรแฝงคือตัวแปรที่มีอยู่ภายในแบบจำลอง และตามคำจำกัดความไม่สามารถกำหนดให้เป็นค่าที่สังเกตได้ ดังนั้นจึงทำโดยการปรับค่าภาวะน่าจะเป็นตามขอบ $p_{\theta }(x)$ แทนที่จะใช้การแจกแจงร่วม $p_{\theta }(x,z)$ อย่างไรก็ตาม เมื่อทำการประมาณภาวะน่าจะเป็นสูงสุด ปริพันธ์ในรูปตามขอบจะมีความซับซ้อนจนไม่สามารถหาได้ ดังนั้นจึงไม่สามารถหาวิธีการแก้ปัญหาเชิงวิเคราะห์สำหรับความน่าจะเป็นหรือตัวประมาณค่าที่มีประสิทธิผลได้^[5]

เมื่อสร้างแบบจำลองด้วยโครงข่ายแบบเบส์ความน่าจะเป็นแบบมีเงื่อนไขแต่ละแบบสามารถหาได้ ดังนั้นการกระจายแบบร่วมจึงสามารถหาได้ ดังนั้น ตามทฤษฎีบทของเบส์ ทำให้ทราบว่าสาเหตุที่หาไม่ได้นั้นมาจากภาวะน่าจะเป็นตามขอบ $p_{\theta }(x)$ และการแจกแจงภายหลัง $p_{\theta }(z|x)$ ^[6]:

p_{\theta }(z|x)={\frac {p_{\theta }(x,z)}{p_{\theta }(x)}}

วิธีการที่นำมาใช้เพื่อแก้ไขปัญหานี้ได้แก่ ขั้นตอนวิธี EM และ ขั้นตอนวิธีเบส์แบบแปรผันเข้ารหัสอัตโนมัติ เป็นต้น

แบบจำลองตัวแปรแฝงเชิงลึก

แบบจำลองตัวแปรแฝงเชิงลึก (deep latent variable model, DLVM) เป็นแบบจำลองตัวแปรแฝงชนิดหนึ่งที่แปลงข้อมูลป้อนเข้าแบบมีเงื่อนไขของโครงข่ายแบบเบส์โดยใช้ โครงข่ายประสาทเทียม^[7] การแจกแจงร่วมแสดงโดยสมการดังนี้:

p_{\theta }(z_{0}=x,z_{1},...,z_{N})=\prod _{i=0}^{N}p_{\theta }(z_{i}|pa(z_{i}))=\prod _{i=0}^{N}p_{\theta }(z_{i};\ \eta =NeuralNet_{\theta }(pa(z_{i})))

แบบจำลองตัวแปรแฝงเชิงลึกใช้โครงข่ายประสาทเทียมที่มีความสามารถในการประมาณแบบทั่วถึงในการแปลงตัวแปรแฝง ดังนั้นจึงสามารถแสดงการแจกแจงตามขอบ $p_{\theta }(x)$ ที่ซับซ้อนของแบบจำลองความน่าจะเป็นแบบมีเงื่อนไข $p_{\theta }(z_{i}|pa(z_{i}))$ ออกมาได้ แม้ว่าจะใช้แค่การแจกแจงแบบง่าย ๆ^[8] .

เนื่องจากแบบจำลองตัวแปรแฝงเชิงลึกเป็นแบบจำลองตัวแปรแฝง จึงไม่สามารถทำการประมาณพารามิเตอร์โดยการประมาณภาวะน่าจะเป็นสูงสุดแบบง่ายได้

ตัวเข้ารหัสอัตโนมัติแบบแปรผัน (VAE) เป็นหนึ่งในวิธีที่การที่ประยุกต์ใช้แบบจำลองตัวแปรแฝงเชิงลึก

อ้างอิง

↑ "a latent variable model $p_{\theta }(x,z)$ " Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "This is also called the (single datapoint) marginal likelihood or the model evidence, when taken as a function of θ." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "Perhaps the simplest, and most common, DLVM is one that is specified as factorization" Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "The distribution $p(z)$ is often called the prior distribution over $z$ , since it is not conditioned on any observations." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "This is due to the integral ... for computing the marginal likelihood ..., not having an analytic solution or efficient estimator." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "The intractability of pθ(x), is related to the intractability of the posterior distribution pθ(z|x). ... Since pθ(x, z) is tractable to compute, a tractable marginal likelihood pθ(x) leads to a tractable posterior pθ(z|x), and vice versa. Both are intractable in DLVMs." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "We use the term deep latent variable model (DLVM) to denote a latent variable model pθ(x, z) whose distributions are parameterized by neural networks." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
↑ "One important advantage of DLVMs, is that even when each factor (prior or conditional distribution) in the directed model is relatively simple (such as conditional Gaussian), the marginal distribution pθ(x) can be very complex"

[1] "a latent variable model $p_{\theta }(x,z)$ " Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[2] "This is also called the (single datapoint) marginal likelihood or the model evidence, when taken as a function of θ." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[3] "Perhaps the simplest, and most common, DLVM is one that is specified as factorization" Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[4] "The distribution $p(z)$ is often called the prior distribution over $z$ , since it is not conditioned on any observations." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[5] "This is due to the integral ... for computing the marginal likelihood ..., not having an analytic solution or efficient estimator." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[6] "The intractability of pθ(x), is related to the intractability of the posterior distribution pθ(z|x). ... Since pθ(x, z) is tractable to compute, a tractable marginal likelihood pθ(x) leads to a tractable posterior pθ(z|x), and vice versa. Both are intractable in DLVMs." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[7] "We use the term deep latent variable model (DLVM) to denote a latent variable model pθ(x, z) whose distributions are parameterized by neural networks." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[8] "One important advantage of DLVMs, is that even when each factor (prior or conditional distribution) in the directed model is relatively simple (such as conditional Gaussian), the marginal distribution pθ(x) can be very complex"

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]