$$ P P(p):=2^{H(p)}=2^{-\sum_{x}p(x)\log_{2}p(x)}=\prod_{x}p(x)^{-p(x)} $$
其中 $H(p)$ 其实是墒。可以看出 Perplexity 其实是墒的指数。我们知道 entropy
(墒) 描述的是随机事件信息量的期望,信息量的定义是 $-\log(p(x))$,它也是哈夫曼最优编码位数的期望。
⚠️ 需要注意的是,概率越大信息量越低,所以我们用混乱程度其实更好理解。
<aside> 💡 Perplexity 和 entropy 描述了 “ 我们可以从一个分布(一个随机变量的取值)中得到多少信息量 或者说 一个分布有多混乱,值越大越混乱 “
</aside>
TODO: 可能需要一些具体的例子用于理解。
BTW: 为什么我们有 entropy 了还需要 perplxity ?
Perplexity Intuition (and its derivation) | by Aerin Kim | Towards Data Science
If we think of perplexity as a branching factor (the weighted average number of choices a random variable has), then that number is easier to understand than the entropy. I found this surprising because I thought there will be more profound reasons. I asked Dr. Zettlemoyer if there is any other reason other than easy interpretability. His answer was “I think that is it! It is largely historical since lots of other metrics would be reasonable to use as well!”