Processing math: 97%

11 Communication

11.1 Information Theory

11.1.1 Entropy

Most of the examples are taken from philentropy package in R, which can be accessed via this link

11.1.1.1 Discrete Entropy

If pX=(px1,px2,...,pxn) is the probability distribution of the discrete random variable X with P(X=x)=pxi. Then, the entropy of the distribution is (Shannon 1948)

H(pX)=xipxilog2pxi

where

  • entropy unit is bits, and H(p)[0,1] and max at p=0.5
  • or the unit is nats (loge)
  • or the unit is dits (log10)

In general, if you have different bases, you can convert between units. Given bases a and b

logak=logab×logbk

where logab is a constant.

Generalize

Hb(X)=logba×Ha(X)

Equivalently,

H(X)=E(1logp(x))

Entropy is the measure of how much information gained from learning the value of X (Zwillinger 1995, 262). Entropy depends on the distribution of X.


Example:

If X has 2 values, then p=(p,1p) and

H(pX)=H(p,1p)=plog2p(1p)log2(1p)

which is known as binary entropy function

Plot of p vs. H(p) where max of H(pX)=log2n when XU

hb <- function(gamma) {
    -gamma * log2(gamma) - (1 - gamma) * log2(1 - gamma) # output in bits
}

xs <- seq(0, 1, len = 100)
plot(xs,
     hb(xs),
     type = 'l',
     xlab = "p",
     ylab = "H(p)")
library("philentropy")
Prob <- 1:10/sum(1:10) # probabilities P(X)
H(Prob) # Shannon's Entropy
## [1] 3.103643

11.1.1.2 Mutual Information

If X and Y are two discrete random variables and pX×Y is the their joint distribution, the mutual information (i.e., learning of a value of X gives information about the value of Y) of X and Y is

I(X,Y)=I(Y,X)=H(pX)+H(pY)H(pX×Y)

where I(X,Y)0 and I(X,Y)=0 iff X and Y are independent (information about X gives no information about Y).

Equivalently,

MI(X,Y)=ni=1mj=1P(xi,yj)×logb(P(xi,yj)P(xi)×P(yj))

Properties:

  • I(X;X)=H(X) known as self-information
  • I(X;Y)0 with equality iff XY
P_x <- 1:10/sum(1:10) # distribution P(X)
P_y <- 20:29/sum(20:29) # distribution P(Y)
P_xy <- 1:10/sum(1:10) # joint distribution P(X,Y)
MI(P_x, P_y, P_xy) # Mutual Information
## [1] 3.311973

11.1.1.3 Relative Entropy

The measure of the distance between distribution p and q is called relative entropy, denoted by D(p||q)

Equivalently, it measures the inefficiency of assuming distribution q when the true distribution is p.

The relative entropy between p(x) and q(x) is

D(p||q)=xp(x)logp(x)q(X)

where

  • D(p||q)0 and equality iff p=q

11.1.1.4 Joint-Entropy

H(X,Y)=ni=1mj=1P(xi,yj)×logb(P(xi,yj))

P_xy <- 1:10/sum(1:10) # joint distribution P(X,Y)
JE(P_xy) # Joint-Entropy
## [1] 3.103643

11.1.1.5 Conditional Entropy

H(Y|X)=ni=1mj=1P(xi,yj)×logb(P(xi)P(xi,yj))

P_x <- 1:10/sum(1:10) # distribution P(X)
P_y <- 1:10/sum(1:10) # distribution P(Y)
CE(P_x, P_y) # Joint-Entropy
## [1] 0

Relation among entropy measures

H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)

Properties:

  • Chain rule: H(X1,...,Xn)=ni=1H(Xi|Xi1,...,X1)
  • 0H(Y|X)H(Y) Information on X does not increase the entropy measure of Y
  • If XY, then H(Y|X)=H(Y);H(X,Y)=H(X)+H(Y)
  • H(X)log|nX| where nX = number of elements in X
  • H(X)=log(nX) iff XU
  • H(X1,...,Xn)niH(Xi)
  • H(X1,...,Xn)=niH(Xi) iff Xi’s are mutually independent.

11.1.1.6 Continuous Entropy

If X is a continuous variable, its entropy is

h(X)=Rdp(x)logp(x)dx

where d is the dimension of X.

If X and Y are continuous random variables with density functions p(x) and q(y), the relative entropy is

H(X,Y)=Rdp(x)logp(x)q(x)dx

More advance analysis can be access here

11.1.2 Divergence

Three metrics for divergence can be found here:

  • Kullback-Leibler
  • Jensen-Shannon
  • Generalized Jensen-Shannon

11.1.3 Channel Capacity

The information channel capacity of a discrete memoryless channel is

C=max

which is the highest rate use at which information can be sent without much error.
More information can be found here