Introduction
For general information about assignments in this course, see assignments
Huffman Coding
Consider a discrete probability distribution on symbols [math]1[/math] to [math]n[/math] with probabilities [math]p_1 \ldots p_n[/math] where each [math]p_i[/math] is an integer power of [math]1/2[/math]. Suppose a long sequence [math]S[/math] of samples is drawn from the distribution such that symbol [math]i[/math] occurs [math]p_i|S|[/math] times. Prove that if a Huffman code based on the probabilities [math]p_i[/math] is used, the total number of bits to encode S is
\begin{equation} |S|\sum_{i=1}^n p_i \lg (1/p_i) \end{equation}
You may use without proof the fact that the two smallest probabilities have to be the same.
Hint: It suffices to prove that symbol $j$ is at depth $\lg (1/p_j)$ in the resulting tree.
Greed is Good
Let [math]e[/math] be the unique lightest edge in a graph [math]G[/math]. Let [math]T[/math] be a spanning tree of [math]G[/math] such that [math]e \notin T[/math] . Prove using elementary properties of spanning trees (i.e. not the cut property) that [math]T[/math] is not a minimum spanning tree of [math]G[/math].
Union by weight
Consider the following alternative implementation of the disjoint-sets data structure.
class Partition:
def __init__(P,n):
# sometimes called makeset(j)
P.parent = [j for j in range(n)]
P.weight = [1] * n
def find(P, key):
if P.parent[key] != key:
key = P.find(P.parent[key])
return key
def union(P,x,y):
rx = P.find(x)
ry = P.find(y)
if rx == ry:
return
if P.weight[rx] > P.weight[ry]:
P.parent[ry] = rx
P.weight[rx] += P.weight[ry]
else:
P.parent[rx] = ry
P.weight[ry] += P.weight[rx]
Prove that after any sequence of union operations, P.find(x)
takes
[math]O(\lg n)[/math] time.