-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題1 ~ 4
- Loading branch information
Showing
6 changed files
with
1,115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,306 @@ | ||
--- | ||
comments: false | ||
title: 東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題1 | ||
tags: | ||
- Tokyo-University | ||
--- | ||
# 東京大学 情報理工学系研究科 コンピュータ科学専攻 2021年2月実施 問題1 | ||
|
||
## **Author** | ||
[zephyr](https://inshi-notes.zephyr-zdz.space/) | ||
|
||
## **Description** | ||
Ordered binary trees are trees in which each node has at most two ordered children. Below, $\mathbf{T_n}$ denotes the set of all the ordered binary trees with $n$ leaves. Let $\mathrm{d_T}(v)$ denote the depth of node $v$ in an ordered binary tree $T$, i.e., the number of edges on the path from the root to $v$. | ||
|
||
For a sequence $\mathbf{P} = (c_1, c_2, \ldots, c_n)$ of $n$ positive real numbers, we define $\mathbf{S_P}$ and $\mathbf{H_P}$ by: | ||
|
||
$$ | ||
\mathbf{S_P} = \sum_{i=1}^{n} c_i, \quad \mathbf{H_P} = - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i/\mathbf{S_P})). | ||
$$ | ||
|
||
For an ordered binary tree $T \in \mathbf{T_n}$, we define $\mathbf{W_P}(T)$ by: | ||
|
||
$$ | ||
\mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)), | ||
$$ | ||
|
||
where $v_i$ is the $i$-th leaf from left in $T$. | ||
|
||
Answer the following questions. | ||
|
||
(1) Give the tree $T \in \mathbf{T_4}$ that has the smallest value of $\mathbf{W_P}(T)$ in case $\mathbf{P} = (4, 2, 1, 1)$. | ||
|
||
(2) Show that $\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1$ holds for any ordered binary tree $T \in \mathbf{T_n}$ with leaves $v_1, v_2, \ldots, v_n$. | ||
|
||
(3) Assume that $x_1, x_2, \ldots, x_n$ range over the set of positive real numbers so that $\sum_{i=1}^{n} x_i = 1$. Show that $\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)$ is maximized when $x_i = c_i/\mathbf{S_P}$ for any sequence $\mathbf{P} = (c_1, c_2, \ldots, c_n)$ of $n$ positive real numbers. | ||
|
||
(4) Show that any ordered binary tree $T \in \mathbf{T_n}$ satisfies $\mathbf{W_P}(T) \geq \mathbf{H_P}$ for any sequence $\mathbf{P} = (c_1, c_2, \ldots, c_n)$ of $n$ positive real numbers. | ||
|
||
--- | ||
|
||
有序二叉树是指每个节点最多有两个有序子节点的树。下面,$\mathbf{T_n}$ 表示具有 $n$ 叶节点的所有有序二叉树的集合。设 $\mathrm{d_T}(v)$ 表示有序二叉树 $T$ 中节点 $v$ 的深度,即从根到 $v$ 的路径上的边数。 | ||
|
||
对于一个由 $n$ 个正实数组成的序列 $\mathbf{P} = (c_1, c_2, \ldots, c_n)$,我们定义 $\mathbf{S_P}$ 和 $\mathbf{H_P}$ 如下: | ||
|
||
$$ | ||
\mathbf{S_P} = \sum_{i=1}^{n} c_i, \quad \mathbf{H_P} = - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i/\mathbf{S_P})). | ||
$$ | ||
|
||
对于一个有序二叉树 $T \in \mathbf{T_n}$,我们定义 $\mathbf{W_P}(T)$ 如下: | ||
|
||
$$ | ||
\mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)), | ||
$$ | ||
|
||
其中 $v_i$ 是 $T$ 中从左数的第 $i$ 个叶节点。 | ||
|
||
回答以下问题。 | ||
|
||
(1) 给出在 $\mathbf{P} = (4, 2, 1, 1)$ 情况下 $\mathbf{T_4}$ 中 $\mathbf{W_P}(T)$ 最小的树 $T$。 | ||
|
||
(2) 证明对于任何有叶节点 $v_1, v_2, \ldots, v_n$ 的有序二叉树 $T \in \mathbf{T_n}$,$\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1$ 成立。 | ||
|
||
(3) 假设 $x_1, x_2, \ldots, x_n$ 在正实数范围内变化,并且满足 $\sum_{i=1}^{n} x_i = 1$。证明对于任何由 $n$ 个正实数组成的序列 $\mathbf{P} = (c_1, c_2, \ldots, c_n)$,当 $x_i = c_i/\mathbf{S_P}$ 时,$\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)$ 取得最大值。 | ||
|
||
(4) 证明对于任何有序二叉树 $T \in \mathbf{T_n}$,对于任何由 $n$ 个正实数组成的序列 $\mathbf{P} = (c_1, c_2, \ldots, c_n)$,$\mathbf{W_P}(T) \geq \mathbf{H_P}$。 | ||
|
||
## **Kai** | ||
### (1) | ||
|
||
To minimize $\mathbf{W_P}(T)$, we should construct a tree that resembles a Huffman tree, where the most frequent items (with the highest weights) are located at shallower depths. Here, the weights are $(4, 2, 1, 1)$. | ||
|
||
**Steps to construct the tree**: | ||
1. Start by pairing the two smallest weights, which are both $1$. | ||
2. Combine these to form a subtree with a combined weight of $2$. | ||
3. Now, we have weights $(4, 2, 2)$. | ||
4. Next, combine the two smallest remaining weights, which are both $2$. | ||
5. Combine these to form a subtree with a combined weight of $4$. | ||
6. Finally, combine the two subtrees $(4, 4)$ to form the complete tree. | ||
|
||
The resulting tree structure is: | ||
|
||
``` | ||
O | ||
/ \ | ||
4 O | ||
/ \ | ||
2 O | ||
/ \ | ||
1 1 | ||
``` | ||
|
||
Thus, the depth of each leaf in the tree is: | ||
|
||
- $c_1 = 4$, depth $\mathrm{d_T}(v_1) = 1$ | ||
- $c_2 = 2$, depth $\mathrm{d_T}(v_2) = 2$ | ||
- $c_3 = 1$, depth $\mathrm{d_T}(v_3) = 3$ | ||
- $c_4 = 1$, depth $\mathrm{d_T}(v_4) = 3$ | ||
|
||
Now, we calculate $\mathbf{W_P}(T)$: | ||
|
||
$$ | ||
\mathbf{W_P}(T) = 4 \cdot 1 + 2 \cdot 2 + 1 \cdot 3 + 1 \cdot 3 = 4 + 4 + 3 + 3 = 14 | ||
$$ | ||
|
||
Thus, the tree $T$ that minimizes $\mathbf{W_P}(T)$ has the above structure. | ||
|
||
### (2) | ||
|
||
To prove the inequality $\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1$ using mathematical induction, we need to follow these steps: | ||
|
||
1. **Base Case** | ||
2. **Inductive Step** | ||
|
||
**Base Case**: For $n = 1$ (a tree with only one leaf), the depth of the only leaf $v_1$ is 0. | ||
|
||
$$ | ||
\sum_{i=1}^{1} 2^{-\mathrm{d_T}(v_i)} = 2^{-\mathrm{d_T}(v_1)} = 2^0 = 1 | ||
$$ | ||
|
||
Thus, the base case holds. | ||
|
||
**Inductive Step**: Assume that for any ordered binary tree with $k$ leaves, the inequality holds: | ||
|
||
$$ | ||
\sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 | ||
$$ | ||
|
||
Now, we need to prove that the inequality holds for an ordered binary tree with $k+1$ leaves. | ||
|
||
1. Consider an ordered binary tree with $k+1$ leaves. | ||
2. Let's denote the depth of the leaves in this tree by $\mathrm{d_T}(v_1), \mathrm{d_T}(v_2), \ldots, \mathrm{d_T}(v_{k+1})$. | ||
|
||
When we add an additional leaf to a tree with $k$ leaves to form a tree with $k+1$ leaves, we must split one of the existing leaves into two children. This operation increases the depth of the affected leaf by 1 and adds a new leaf with the same depth. | ||
|
||
Suppose we split the leaf $v_j$ (where $\mathrm{d_T}(v_j) = d$) into two new leaves $v_j'$ and $v_{k+1}$, both at depth $d+1$. | ||
|
||
Thus, we need to show: | ||
|
||
$$ | ||
\sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} + 2^{-(d+1)} + 2^{-(d+1)} \leq 1 | ||
$$ | ||
|
||
By the inductive hypothesis, for the original tree with $k$ leaves: | ||
|
||
$$ | ||
\sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 | ||
$$ | ||
|
||
In the new tree: | ||
|
||
$$ | ||
\sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} - 2^{-d} + 2^{-(d+1)} + 2^{-(d+1)} | ||
$$ | ||
|
||
Since $2^{-(d+1)} + 2^{-(d+1)} = 2^{-d}$: | ||
|
||
$$ | ||
\sum_{i=1}^{k+1} 2^{-\mathrm{d_T}(v_i)} = \sum_{i=1}^{k} 2^{-\mathrm{d_T}(v_i)} \leq 1 | ||
$$ | ||
|
||
Thus, the inequality holds after adding a new leaf and increasing the depth of the original leaf. | ||
|
||
By induction, the inequality $\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1$ holds for all $n$. | ||
|
||
### (3) | ||
|
||
To maximize $\sum_{i=1}^{n} (c_i \cdot \log_2 x_i)$ under the constraint $\sum_{i=1}^{n} x_i = 1$, we use the method of Lagrange multipliers. | ||
|
||
Define the Lagrangian: | ||
|
||
$$ | ||
\mathcal{L}(x_1, \ldots, x_n, \lambda) = \sum_{i=1}^{n} (c_i \cdot \log_2 x_i) + \lambda \left( 1 - \sum_{i=1}^{n} x_i \right) | ||
$$ | ||
|
||
Take the partial derivatives and set them to zero: | ||
|
||
$$ | ||
\frac{\partial \mathcal{L}}{\partial x_i} = \frac{c_i}{x_i \ln 2} - \lambda = 0 \quad \Rightarrow \quad x_i = \frac{c_i}{\lambda \ln 2} | ||
$$ | ||
|
||
Using the constraint $\sum_{i=1}^{n} x_i = 1$: | ||
|
||
$$ | ||
\sum_{i=1}^{n} \frac{c_i}{\lambda \ln 2} = 1 \quad \Rightarrow \quad \lambda \ln 2 = \mathbf{S_P} \quad \Rightarrow \quad \lambda = \frac{\mathbf{S_P}}{\ln 2} | ||
$$ | ||
|
||
Thus, the maximizing $x_i$ is: | ||
|
||
$$ | ||
x_i = \frac{c_i}{\mathbf{S_P}} | ||
$$ | ||
|
||
### (4) | ||
|
||
To show that $\mathbf{W_P}(T) \geq \mathbf{H_P}$, we need to use the definitions of $\mathbf{W_P}(T)$ and $\mathbf{H_P}$ and employ some fundamental principles of information theory and entropy. | ||
|
||
Recall: | ||
|
||
$$ | ||
\mathbf{W_P}(T) = \sum_{i=1}^{n} (c_i \cdot \mathrm{d_T}(v_i)) | ||
$$ | ||
|
||
$$ | ||
\mathbf{H_P} = - \sum_{i=1}^{n} \left( c_i \cdot \log_2 \left(\frac{c_i}{\mathbf{S_P}}\right) \right) | ||
$$ | ||
|
||
We start by rewriting $\mathbf{H_P}$ in a more convenient form: | ||
|
||
$$ | ||
\mathbf{H_P} = \sum_{i=1}^{n} c_i \cdot \left( \log_2(\mathbf{S_P}) - \log_2(c_i) \right) = \log_2(\mathbf{S_P}) \cdot \sum_{i=1}^{n} c_i - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i)) | ||
$$ | ||
|
||
Since $\sum_{i=1}^{n} c_i = \mathbf{S_P}$, we get: | ||
|
||
$$ | ||
\mathbf{H_P} = \mathbf{S_P} \cdot \log_2(\mathbf{S_P}) - \sum_{i=1}^{n} (c_i \cdot \log_2(c_i)) | ||
$$ | ||
|
||
Next, consider the following inequality derived from Jensen's inequality for the concave function $f(x) = -x \log_2(x)$: | ||
|
||
$$ | ||
-\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \frac{c_i}{\mathbf{S_P}} \right) | ||
$$ | ||
|
||
This simplifies to: | ||
|
||
$$ | ||
-\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \sum_{i=1}^{n} \frac{c_i^2}{\mathbf{S_P}^2} \right) | ||
$$ | ||
|
||
Since $\sum_{i=1}^{n} c_i = \mathbf{S_P}$, we get: | ||
|
||
$$ | ||
-\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \frac{1}{\mathbf{S_P}^2} \sum_{i=1}^{n} c_i^2 \right) | ||
$$ | ||
|
||
So: | ||
|
||
$$ | ||
-\sum_{i=1}^{n} \frac{c_i}{\mathbf{S_P}} \cdot \log_2\left( \frac{c_i}{\mathbf{S_P}} \right) \leq - \log_2\left( \frac{1}{\mathbf{S_P}^2} \cdot \mathbf{S_P} \cdot \frac{1}{n} \sum_{i=1}^{n} c_i \right) = \log_2(\mathbf{S_P}) | ||
$$ | ||
|
||
The weighted path length $\mathbf{W_P}(T)$ can be understood using Kraft's inequality, which relates the depths of leaves in a binary tree to probabilities that sum up to 1. Let $p_i = \frac{c_i}{\mathbf{S_P}}$ be the probability associated with the $i$-th leaf. According to Kraft's inequality: | ||
|
||
$$ | ||
\sum_{i=1}^{n} 2^{-\mathrm{d_T}(v_i)} \leq 1 | ||
$$ | ||
|
||
We multiply both sides by $\mathbf{S_P}$: | ||
|
||
$$ | ||
\mathbf{S_P} \sum_{i=1}^{n} p_i \cdot 2^{-\mathrm{d_T}(v_i)} \leq \mathbf{S_P} | ||
$$ | ||
|
||
Using the fact that $p_i = \frac{c_i}{\mathbf{S_P}}$: | ||
|
||
$$ | ||
\sum_{i=1}^{n} c_i \cdot 2^{-\mathrm{d_T}(v_i)} \leq \mathbf{S_P} | ||
$$ | ||
|
||
Now, we apply the definition of entropy: | ||
|
||
$$ | ||
\mathbf{H_P} = -\mathbf{S_P} \sum_{i=1}^{n} p_i \log_2(p_i) | ||
$$ | ||
|
||
Using Gibbs' inequality, we know that: | ||
|
||
$$ | ||
-\sum_{i=1}^{n} p_i \log_2(p_i) \leq \sum_{i=1}^{n} p_i \mathrm{d_T}(v_i) | ||
$$ | ||
|
||
Multiplying both sides by $\mathbf{S_P}$: | ||
|
||
$$ | ||
\mathbf{S_P} \cdot \mathbf{H_P} \leq \mathbf{S_P} \sum_{i=1}^{n} p_i \cdot \mathrm{d_T}(v_i) | ||
$$ | ||
|
||
Substituting $p_i = \frac{c_i}{\mathbf{S_P}}$ into the inequality: | ||
|
||
$$ | ||
\mathbf{H_P} \leq \sum_{i=1}^{n} c_i \cdot \mathrm{d_T}(v_i) | ||
$$ | ||
|
||
Thus, we have shown that $\mathbf{W_P}(T) \geq \mathbf{H_P}$ for any ordered binary tree $T \in \mathbf{T_n}$ and any sequence $\mathbf{P} = (c_1, c_2, \ldots, c_n)$ of $n$ positive real numbers. | ||
|
||
## **Knowledge** | ||
|
||
有序二叉树 哈夫曼树 信息论 拉格朗日乘数法 数学归纳法 | ||
|
||
### 重点词汇 | ||
|
||
- Ordered binary tree 有序二叉树 | ||
- Huffman tree 哈夫曼树 | ||
- Depth 深度 | ||
- Lagrange multipliers 拉格朗日乘子 | ||
- Entropy 熵 | ||
- Lagrange multiplier 拉格朗日乘数 | ||
- Jensen's inequality 詹森不等式 | ||
- Gibbs' inequality 吉布斯不等式 | ||
- Kraft's inequality 克拉夫特不等式 | ||
|
||
### 参考资料 | ||
|
||
1. Introduction to Algorithms, Chapter 16: Greedy Algorithms | ||
2. Information Theory, Inference, and Learning Algorithms, David J.C. MacKay |
Oops, something went wrong.