想请教各位大侠 KL divergence的问题,wiki上说 KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.
公式是
通常P是unknown的,Q是一个对P的approximation, 也就是说Q的概率是可求出来的,但是对于unknown的P,如何求出相对熵呢?也就是说这里p(x)是怎么得到的呢?
谢谢指导