Skip to content

Latest commit

 

History

History
116 lines (98 loc) · 3.9 KB

KL Divergence.md

File metadata and controls

116 lines (98 loc) · 3.9 KB

Kullback–Leibler divergence

What is Probability Distribution?
A mathematical function that can be thought of as providing the probability of occurance of different possible outcomes.
Example: Toss of a coin twice

Number of Heads Probability
0 0.25
1 0.5
2 0.25

What is KL Divergence?
It is a non-symmetric measure of difference between two probability distributions p(x) and q(x) over a same variable x. It is used to quantify the information lost when q(x) is used to approximate p(x)

KLD = ∑ p(x) * ( ln p(x)/ q(x) )

Example:

Age Propensity to buy a Motorcycle Segment
23 1 1
42 1 4
54 0 3
32 1 2
63 0 5
56 0 1
24 1 2
65 0 3
54 0 2
63 0 1
53 1 4
57 0 3
61 1 5
54 1 2
64 1 3
24 0 4
33 0 2
45 0 1
34 1 1
43 0 2
63 1 2
23 1 3
34 1 3
42 0 3
33 1 4
45 1 4
62 0 4
23 1 5
37 0 5
46 0 5
58 1 5

KLD Matrix:

Row Labels (Clusters)/Column Labels (Count) 20-29 30-39 40-49 50-59 60-69 Grand Total
1 1 1 1 1 1 5
2 1 2 1 2 1 7
3 1 1 1 2 2 7
4 1 1 2 1 1 6
5 1 1 1 1 2 6
Grand Total 5 6 6 7 7 31

Calculation of q(i) and p(i): Attribute Count / Segment Total (1/5 = 0.2)

Label(C) / Segments(R) 20-29 30-39 40-49 50-59 60-69
q(1) 0.2 0.2 0.2 0.2 0.2
q(2) 0.142857143 0.285714286 0.142857143 0.285714286 0.142857143
q(3) 0.142857143 0.142857143 0.142857143 0.285714286 0.285714286
q(4) 0.166666667 0.166666667 0.333333333 0.166666667 0.166666667
q(5) 0.166666667 0.166666667 0.166666667 0.166666667 0.333333333
p(t) 0.161290323 0.193548387 0.193548387 0.225806452 0.225806452

Calculation of LN(p(x) / q(x)):

Label(C) / Segments(R) 20-29 30-39 40-49 50-59 60-69
ln(p(t)/q(1)) -0.21511138 -0.032789823 -0.032789823 0.121360857 0.121360857
ln(p(t)/q(2)) 0.121360857 -0.389464767 0.303682414 -0.235314087 0.457833094
ln(p(t)/q(3)) 0.121360857 0.303682414 0.303682414 -0.235314087 -0.235314087
ln(p(t)/q(4)) -0.032789823 0.149531734 -0.543615447 0.303682414 0.303682414
ln(p(t)/q(5)) -0.032789823 0.149531734 0.149531734 0.303682414 -0.389464767

KL Divergence Calculation:

KLD = ∑ p(x) * ( ln p(x)/ q(x) )
Segment KLD
1 0.007419911
2 0.053217523
3 0.030857937
4 0.055583948
5 0.033224362

KLD Measure:

KLD Range Indication
< 0.1 Attribute is a weak distribution in the Segment
> 0.1 Attribute has a good distribution in the Segment
> 0.3 Attribute has a strong distribution in the Segment

References:

  1. Probability Distribution
  2. Probability Distribution Wiki
  3. KLD Layman's Explanation
  4. Kullback-Leibler Divergence
  5. KL Divergence Wiki