東京大学新領域創成科学研究科メディカル情報生命専攻 2018年8月実施問題11

Myyura · Jan 30, 2025 · 3449722 · 3449722
1 parent 5038f62
commit 3449722
Show file tree

Hide file tree

Showing 3 changed files with 229 additions and 0 deletions.
diff --git a/docs/kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md b/docs/kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md
@@ -0,0 +1,227 @@
+---
+comments: false
+title: 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
+tags:
+  - Tokyo-University
+---
+
+# 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
+
+## **Author**
+[zephyr](https://inshi-notes.zephyr-zdz.space/)
+
+## **Description**
+Suppose that a sequence $\mathbf{x} = x_1 x_2 \cdots x_n$ is generated from a first-order stationary Markov model that has initial probabilities $\{\pi_k\}$ and transition probabilities $\{a_{ij}\}$ as follows.
+
+Initial probabilities:
+
+$$
+P(x_1 = k) = \pi_k, \quad \text{for } k \in \{0, 1\}
+$$
+
+Transition probabilities:
+
+$$
+P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{for } i, j \in \{0, 1\}, \quad t = 2, \ldots, n
+$$
+
+Note that $P(X)$ is the probability that $X$ is true, and $P(X \mid Y)$ is the conditional probability that $X$ is true when $Y$ is true.
+
+Answer the following questions:
+
+(1) Assume $n = 4$. Show the probability that $101$ is included as a continuous substring in $\mathbf{x}$. Show also the probability that $111$ is included as a continuous substring in $\mathbf{x}$.
+
+(2) Assume $n = 4$. Show the expected number of 1s in $\mathbf{x}$ when $101$ is included as a continuous substring in $\mathbf{x}$.
+
+In the following questions, use the following transition probabilities.
+
+$$
+a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7
+$$
+
+(3) Calculate the expected proportion of 1s in $\mathbf{x}$ when $n \to \infty$.
+
+(4) Suppose that $n = 2m$ ($m$ is a positive integer). When every two letters of $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ is converted to a letter of $\mathbf{y} = y_1 \cdots y_m$ using the following rule, calculate the expected proportions of $a, c, g, t$ in $\mathbf{y}$ when $m \to \infty$.
+
+$$
+y_i = 
+\begin{cases} 
+a, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 0, \\
+c, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 1, \\
+g, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 0, \\
+t, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 1, \\
+\end{cases}
+\quad \text{for } i = 1, \ldots, m
+$$
+
+---
+
+假设序列 $\mathbf{x} = x_1 x_2 \cdots x_n$ 是从一个一阶平稳马尔可夫模型生成的，该模型具有如下初始概率 $\{\pi_k\}$ 和转移概率 $\{a_{ij}\}$。
+
+初始概率：
+
+$$
+P(x_1 = k) = \pi_k, \quad \text{对于 } k \in \{0, 1\}
+$$
+
+转移概率：
+
+$$
+P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{对于 } i, j \in \{0, 1\}, \quad t = 2, \ldots, n
+$$
+
+注意 $P(X)$ 是 $X$ 为真的概率，$P(X \mid Y)$ 是 $X$ 在 $Y$ 为真时的条件概率。
+
+回答以下问题：
+
+(1) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串的概率。也证明在 $\mathbf{x}$ 中包含 $111$ 作为连续子串的概率。
+
+(2) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串时 $\mathbf{x}$ 中 1 的期望数量。
+
+在以下问题中，使用以下转移概率。
+
+$$
+a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7
+$$
+
+(3) 计算当 $n \to \infty$ 时 $\mathbf{x}$ 中 1 的期望比例。
+
+(4) 假设 $n = 2m$ ($m$ 是一个正整数)。当 $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ 的每两个字母转换为 $\mathbf{y} = y_1 \cdots y_m$ 的一个字母时，使用以下规则，计算 $\mathbf{y}$ 中 $a, c, g, t$ 的期望比例，当 $m \to \infty$ 时。
+
+$$
+y_i = 
+\begin{cases} 
+a, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 0, \\
+c, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 1, \\
+g, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 0, \\
+t, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 1, \\
+\end{cases}
+\quad \text{对于 } i = 1, \ldots, m
+$$
+
+## **Kai**
+### (1)
+
+To find the distribution function of the smallest order statistic $\mathbf{X_{(1)}}$, we consider:
+
+$$
+F_{\mathbf{X_{(1)}}}(x) = P(\mathbf{X_{(1)}} \leq x).
+$$
+
+Since $\mathbf{X_{(1)}}$ is the smallest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(1)}} \gt x$ means that all $\mathbf{X_i} \gt x$. Thus:
+
+$$
+F_{\mathbf{X_{(1)}}}(x) = 1 - P(\mathbf{X_{(1)}} > x).
+$$
+
+We know that $\mathbf{X_{(1)}} > x$ if and only if all $\mathbf{X_i} > x$, so:
+
+$$
+P(\mathbf{X_{(1)}} > x) = P(\mathbf{X_1} > x, \mathbf{X_2} > x, \ldots, \mathbf{X_n} > x) = \left( P(\mathbf{X_1} > x) \right)^n = (1 - F(x))^n.
+$$
+
+Thus, the distribution function of $\mathbf{X_{(1)}}$ is:
+
+$$
+F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - F(x))^n.
+$$
+
+### (2)
+
+To find the distribution function of the largest order statistic $\mathbf{X_{(n)}}$, we consider:
+
+$$
+F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_{(n)}} \leq x).
+$$
+
+Since $\mathbf{X_{(n)}}$ is the largest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(n)}} \leq x$ means that at least one $\mathbf{X_i} \leq x$. Thus:
+
+$$
+F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_1} \leq x, \mathbf{X_2} \leq x, \ldots, \mathbf{X_n} \leq x) = \left( P(\mathbf{X_1} \leq x) \right)^n = (F(x))^n.
+$$
+
+### (3)
+
+To find the distribution function of the $k$-th order statistic $\mathbf{X_{(k)}}$, we need to determine the probability $F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x)$. This represents the probability that the $k$-th smallest value among $\mathbf{X_1}, \ldots, \mathbf{X_n}$ is less than or equal to $x$.
+
+#### Step 1: Basic Concepts and Binomial Probability
+
+Since $\mathbf{X_i}$ are independent and identically distributed, the probability that any particular $\mathbf{X_i}$ is less than or equal to $x$ is $F(x)$. Similarly, the probability that $\mathbf{X_i}$ is greater than $x$ is $1 - F(x)$.
+
+#### Step 2: Using Binomial Distribution
+
+We can think of this as a binomial distribution problem. We need to consider the event that at least $k$ out of $n$ $\mathbf{X_i}$ values are less than or equal to $x$. Mathematically, this can be expressed as:
+
+$$
+F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x) = \sum_{j=k}^{n} \binom{n}{j} (F(x))^j (1 - F(x))^{n-j}.
+$$
+
+Here, $\binom{n}{j}$ is the binomial coefficient, representing the number of ways to choose $j$ successes (values $\leq x$) out of $n$ trials.
+
+### (4)
+
+If $F(x)$ is the uniform distribution over $[0,1]$, then $F(x) = x$ for $x \in [0,1]$. Therefore:
+
+$$
+F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - x)^n.
+$$
+
+The expectation of $\mathbf{X_{(1)}}$ is given by:
+
+$$
+\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x f_{\mathbf{X_{(1)}}}(x) \, \mathrm{d}x,
+$$
+
+where $f_{\mathbf{X_{(1)}}}(x)$ is the derivative of $F_{\mathbf{X_{(1)}}}(x)$:
+
+$$
+f_{\mathbf{X_{(1)}}}(x) = \frac{\mathrm{d}}{\mathrm{dx}} \left[ 1 - (1 - x)^n \right] = n (1 - x)^{n-1}.
+$$
+
+Therefore:
+
+$$
+\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x \cdot n (1 - x)^{n-1} \, \mathrm{d}x.
+$$
+
+This is a Beta distribution integral:
+
+$$
+\mathbb{E}[\mathbf{X_{(1)}}] = n \int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x.
+$$
+
+Using the Beta function property, we get:
+
+$$
+\int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x = \frac{1}{n+1}.
+$$
+
+Thus:
+
+$$
+\mathbb{E}[\mathbf{X_{(1)}}] = \frac{n}{n+1}.
+$$
+
+## **Knowledge**
+
+#顺序统计量 #概率分布函数 #期望值  #Beta分布
+
+### 难点思路
+
+第 3 小问关于任意 $k$ 阶顺序统计量的分布函数需要理解 Binomial 分布的性质并进行累加，这是一个较难点。
+
+### 解题技巧和信息
+
+对于顺序统计量，了解如何通过分布函数 $F(x)$ 来表示最小和最大顺序统计量的分布函数非常重要。对于均匀分布的情况，可以利用 Beta 分布性质简化期望值计算。
+
+### 重点词汇
+
+- order statistic 顺序统计量
+- distribution function 分布函数
+- expectation 期望值
+- uniform distribution 均匀分布
+
+### 参考资料
+
+1. "Probability and Statistics" by Morris H. DeGroot and Mark J. Schervish, Chapter 5.
+2. "A First Course in Probability" by Sheldon Ross, Chapter 8.
diff --git a/docs/kakomonn/tokyo_university/index.md b/docs/kakomonn/tokyo_university/index.md
@@ -251,6 +251,7 @@ tags:
             - [8月 問題8](frontier_sciences/cbms_201808_8.md)
             - [8月 問題9](frontier_sciences/cbms_201808_9.md)
             - [8月 問題10](frontier_sciences/cbms_201808_10.md)
+            - [8月 問題11](frontier_sciences/cbms_201808_11.md)
     - 海洋技術環境学専攻:
         - 2022年度:
             - [第1~6問](frontier_sciences/otpe_2022_all.md)

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -372,6 +372,7 @@ nav:
             - 8月 問題8: kakomonn/tokyo_university/frontier_sciences/cbms_201808_8.md
             - 8月 問題9: kakomonn/tokyo_university/frontier_sciences/cbms_201808_9.md
             - 8月 問題10: kakomonn/tokyo_university/frontier_sciences/cbms_201808_10.md
+            - 8月 問題11: kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md
         - 海洋技術環境学専攻:
           - 2022年度:
             - 第1~6問: kakomonn/tokyo_university/frontier_sciences/otpe_2022_all.md