-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
- Loading branch information
Showing
3 changed files
with
229 additions
and
0 deletions.
There are no files selected for viewing
227 changes: 227 additions & 0 deletions
227
docs/kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,227 @@ | ||
--- | ||
comments: false | ||
title: 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11 | ||
tags: | ||
- Tokyo-University | ||
--- | ||
|
||
# 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11 | ||
|
||
## **Author** | ||
[zephyr](https://inshi-notes.zephyr-zdz.space/) | ||
|
||
## **Description** | ||
Suppose that a sequence $\mathbf{x} = x_1 x_2 \cdots x_n$ is generated from a first-order stationary Markov model that has initial probabilities $\{\pi_k\}$ and transition probabilities $\{a_{ij}\}$ as follows. | ||
|
||
Initial probabilities: | ||
|
||
$$ | ||
P(x_1 = k) = \pi_k, \quad \text{for } k \in \{0, 1\} | ||
$$ | ||
|
||
Transition probabilities: | ||
|
||
$$ | ||
P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{for } i, j \in \{0, 1\}, \quad t = 2, \ldots, n | ||
$$ | ||
|
||
Note that $P(X)$ is the probability that $X$ is true, and $P(X \mid Y)$ is the conditional probability that $X$ is true when $Y$ is true. | ||
|
||
Answer the following questions: | ||
|
||
(1) Assume $n = 4$. Show the probability that $101$ is included as a continuous substring in $\mathbf{x}$. Show also the probability that $111$ is included as a continuous substring in $\mathbf{x}$. | ||
|
||
(2) Assume $n = 4$. Show the expected number of 1s in $\mathbf{x}$ when $101$ is included as a continuous substring in $\mathbf{x}$. | ||
|
||
In the following questions, use the following transition probabilities. | ||
|
||
$$ | ||
a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7 | ||
$$ | ||
|
||
(3) Calculate the expected proportion of 1s in $\mathbf{x}$ when $n \to \infty$. | ||
|
||
(4) Suppose that $n = 2m$ ($m$ is a positive integer). When every two letters of $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ is converted to a letter of $\mathbf{y} = y_1 \cdots y_m$ using the following rule, calculate the expected proportions of $a, c, g, t$ in $\mathbf{y}$ when $m \to \infty$. | ||
|
||
$$ | ||
y_i = | ||
\begin{cases} | ||
a, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 0, \\ | ||
c, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 1, \\ | ||
g, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 0, \\ | ||
t, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 1, \\ | ||
\end{cases} | ||
\quad \text{for } i = 1, \ldots, m | ||
$$ | ||
|
||
--- | ||
|
||
假设序列 $\mathbf{x} = x_1 x_2 \cdots x_n$ 是从一个一阶平稳马尔可夫模型生成的,该模型具有如下初始概率 $\{\pi_k\}$ 和转移概率 $\{a_{ij}\}$。 | ||
|
||
初始概率: | ||
|
||
$$ | ||
P(x_1 = k) = \pi_k, \quad \text{对于 } k \in \{0, 1\} | ||
$$ | ||
|
||
转移概率: | ||
|
||
$$ | ||
P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{对于 } i, j \in \{0, 1\}, \quad t = 2, \ldots, n | ||
$$ | ||
|
||
注意 $P(X)$ 是 $X$ 为真的概率,$P(X \mid Y)$ 是 $X$ 在 $Y$ 为真时的条件概率。 | ||
|
||
回答以下问题: | ||
|
||
(1) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串的概率。也证明在 $\mathbf{x}$ 中包含 $111$ 作为连续子串的概率。 | ||
|
||
(2) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串时 $\mathbf{x}$ 中 1 的期望数量。 | ||
|
||
在以下问题中,使用以下转移概率。 | ||
|
||
$$ | ||
a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7 | ||
$$ | ||
|
||
(3) 计算当 $n \to \infty$ 时 $\mathbf{x}$ 中 1 的期望比例。 | ||
|
||
(4) 假设 $n = 2m$ ($m$ 是一个正整数)。当 $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ 的每两个字母转换为 $\mathbf{y} = y_1 \cdots y_m$ 的一个字母时,使用以下规则,计算 $\mathbf{y}$ 中 $a, c, g, t$ 的期望比例,当 $m \to \infty$ 时。 | ||
|
||
$$ | ||
y_i = | ||
\begin{cases} | ||
a, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 0, \\ | ||
c, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 1, \\ | ||
g, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 0, \\ | ||
t, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 1, \\ | ||
\end{cases} | ||
\quad \text{对于 } i = 1, \ldots, m | ||
$$ | ||
|
||
## **Kai** | ||
### (1) | ||
|
||
To find the distribution function of the smallest order statistic $\mathbf{X_{(1)}}$, we consider: | ||
|
||
$$ | ||
F_{\mathbf{X_{(1)}}}(x) = P(\mathbf{X_{(1)}} \leq x). | ||
$$ | ||
|
||
Since $\mathbf{X_{(1)}}$ is the smallest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(1)}} \gt x$ means that all $\mathbf{X_i} \gt x$. Thus: | ||
|
||
$$ | ||
F_{\mathbf{X_{(1)}}}(x) = 1 - P(\mathbf{X_{(1)}} > x). | ||
$$ | ||
|
||
We know that $\mathbf{X_{(1)}} > x$ if and only if all $\mathbf{X_i} > x$, so: | ||
|
||
$$ | ||
P(\mathbf{X_{(1)}} > x) = P(\mathbf{X_1} > x, \mathbf{X_2} > x, \ldots, \mathbf{X_n} > x) = \left( P(\mathbf{X_1} > x) \right)^n = (1 - F(x))^n. | ||
$$ | ||
|
||
Thus, the distribution function of $\mathbf{X_{(1)}}$ is: | ||
|
||
$$ | ||
F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - F(x))^n. | ||
$$ | ||
|
||
### (2) | ||
|
||
To find the distribution function of the largest order statistic $\mathbf{X_{(n)}}$, we consider: | ||
|
||
$$ | ||
F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_{(n)}} \leq x). | ||
$$ | ||
|
||
Since $\mathbf{X_{(n)}}$ is the largest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(n)}} \leq x$ means that at least one $\mathbf{X_i} \leq x$. Thus: | ||
|
||
$$ | ||
F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_1} \leq x, \mathbf{X_2} \leq x, \ldots, \mathbf{X_n} \leq x) = \left( P(\mathbf{X_1} \leq x) \right)^n = (F(x))^n. | ||
$$ | ||
|
||
### (3) | ||
|
||
To find the distribution function of the $k$-th order statistic $\mathbf{X_{(k)}}$, we need to determine the probability $F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x)$. This represents the probability that the $k$-th smallest value among $\mathbf{X_1}, \ldots, \mathbf{X_n}$ is less than or equal to $x$. | ||
|
||
#### Step 1: Basic Concepts and Binomial Probability | ||
|
||
Since $\mathbf{X_i}$ are independent and identically distributed, the probability that any particular $\mathbf{X_i}$ is less than or equal to $x$ is $F(x)$. Similarly, the probability that $\mathbf{X_i}$ is greater than $x$ is $1 - F(x)$. | ||
|
||
#### Step 2: Using Binomial Distribution | ||
|
||
We can think of this as a binomial distribution problem. We need to consider the event that at least $k$ out of $n$ $\mathbf{X_i}$ values are less than or equal to $x$. Mathematically, this can be expressed as: | ||
|
||
$$ | ||
F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x) = \sum_{j=k}^{n} \binom{n}{j} (F(x))^j (1 - F(x))^{n-j}. | ||
$$ | ||
|
||
Here, $\binom{n}{j}$ is the binomial coefficient, representing the number of ways to choose $j$ successes (values $\leq x$) out of $n$ trials. | ||
|
||
### (4) | ||
|
||
If $F(x)$ is the uniform distribution over $[0,1]$, then $F(x) = x$ for $x \in [0,1]$. Therefore: | ||
|
||
$$ | ||
F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - x)^n. | ||
$$ | ||
|
||
The expectation of $\mathbf{X_{(1)}}$ is given by: | ||
|
||
$$ | ||
\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x f_{\mathbf{X_{(1)}}}(x) \, \mathrm{d}x, | ||
$$ | ||
|
||
where $f_{\mathbf{X_{(1)}}}(x)$ is the derivative of $F_{\mathbf{X_{(1)}}}(x)$: | ||
|
||
$$ | ||
f_{\mathbf{X_{(1)}}}(x) = \frac{\mathrm{d}}{\mathrm{dx}} \left[ 1 - (1 - x)^n \right] = n (1 - x)^{n-1}. | ||
$$ | ||
|
||
Therefore: | ||
|
||
$$ | ||
\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x \cdot n (1 - x)^{n-1} \, \mathrm{d}x. | ||
$$ | ||
|
||
This is a Beta distribution integral: | ||
|
||
$$ | ||
\mathbb{E}[\mathbf{X_{(1)}}] = n \int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x. | ||
$$ | ||
|
||
Using the Beta function property, we get: | ||
|
||
$$ | ||
\int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x = \frac{1}{n+1}. | ||
$$ | ||
|
||
Thus: | ||
|
||
$$ | ||
\mathbb{E}[\mathbf{X_{(1)}}] = \frac{n}{n+1}. | ||
$$ | ||
|
||
## **Knowledge** | ||
|
||
#顺序统计量 #概率分布函数 #期望值 #Beta分布 | ||
|
||
### 难点思路 | ||
|
||
第 3 小问关于任意 $k$ 阶顺序统计量的分布函数需要理解 Binomial 分布的性质并进行累加,这是一个较难点。 | ||
|
||
### 解题技巧和信息 | ||
|
||
对于顺序统计量,了解如何通过分布函数 $F(x)$ 来表示最小和最大顺序统计量的分布函数非常重要。对于均匀分布的情况,可以利用 Beta 分布性质简化期望值计算。 | ||
|
||
### 重点词汇 | ||
|
||
- order statistic 顺序统计量 | ||
- distribution function 分布函数 | ||
- expectation 期望值 | ||
- uniform distribution 均匀分布 | ||
|
||
### 参考资料 | ||
|
||
1. "Probability and Statistics" by Morris H. DeGroot and Mark J. Schervish, Chapter 5. | ||
2. "A First Course in Probability" by Sheldon Ross, Chapter 8. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters