Skip to content

Commit

Permalink
東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
Browse files Browse the repository at this point in the history
  • Loading branch information
Myyura committed Jan 30, 2025
1 parent 5038f62 commit 3449722
Show file tree
Hide file tree
Showing 3 changed files with 229 additions and 0 deletions.
227 changes: 227 additions & 0 deletions docs/kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
---
comments: false
title: 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
tags:
- Tokyo-University
---

# 東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11

## **Author**
[zephyr](https://inshi-notes.zephyr-zdz.space/)

## **Description**
Suppose that a sequence $\mathbf{x} = x_1 x_2 \cdots x_n$ is generated from a first-order stationary Markov model that has initial probabilities $\{\pi_k\}$ and transition probabilities $\{a_{ij}\}$ as follows.

Initial probabilities:

$$
P(x_1 = k) = \pi_k, \quad \text{for } k \in \{0, 1\}
$$

Transition probabilities:

$$
P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{for } i, j \in \{0, 1\}, \quad t = 2, \ldots, n
$$

Note that $P(X)$ is the probability that $X$ is true, and $P(X \mid Y)$ is the conditional probability that $X$ is true when $Y$ is true.

Answer the following questions:

(1) Assume $n = 4$. Show the probability that $101$ is included as a continuous substring in $\mathbf{x}$. Show also the probability that $111$ is included as a continuous substring in $\mathbf{x}$.

(2) Assume $n = 4$. Show the expected number of 1s in $\mathbf{x}$ when $101$ is included as a continuous substring in $\mathbf{x}$.

In the following questions, use the following transition probabilities.

$$
a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7
$$

(3) Calculate the expected proportion of 1s in $\mathbf{x}$ when $n \to \infty$.

(4) Suppose that $n = 2m$ ($m$ is a positive integer). When every two letters of $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ is converted to a letter of $\mathbf{y} = y_1 \cdots y_m$ using the following rule, calculate the expected proportions of $a, c, g, t$ in $\mathbf{y}$ when $m \to \infty$.

$$
y_i =
\begin{cases}
a, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 0, \\
c, & \text{if } x_{2i-1} = 0 \text{ and } x_{2i} = 1, \\
g, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 0, \\
t, & \text{if } x_{2i-1} = 1 \text{ and } x_{2i} = 1, \\
\end{cases}
\quad \text{for } i = 1, \ldots, m
$$

---

假设序列 $\mathbf{x} = x_1 x_2 \cdots x_n$ 是从一个一阶平稳马尔可夫模型生成的,该模型具有如下初始概率 $\{\pi_k\}$ 和转移概率 $\{a_{ij}\}$。

初始概率:

$$
P(x_1 = k) = \pi_k, \quad \text{对于 } k \in \{0, 1\}
$$

转移概率:

$$
P(x_t = j \mid x_{t-1} = i) = a_{ij}, \quad \text{对于 } i, j \in \{0, 1\}, \quad t = 2, \ldots, n
$$

注意 $P(X)$ 是 $X$ 为真的概率,$P(X \mid Y)$ 是 $X$ 在 $Y$ 为真时的条件概率。

回答以下问题:

(1) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串的概率。也证明在 $\mathbf{x}$ 中包含 $111$ 作为连续子串的概率。

(2) 假设 $n = 4$。证明在 $\mathbf{x}$ 中包含 $101$ 作为连续子串时 $\mathbf{x}$ 中 1 的期望数量。

在以下问题中,使用以下转移概率。

$$
a_{00} = 0.8, \quad a_{01} = 0.2, \quad a_{10} = 0.3, \quad a_{11} = 0.7
$$

(3) 计算当 $n \to \infty$ 时 $\mathbf{x}$ 中 1 的期望比例。

(4) 假设 $n = 2m$ ($m$ 是一个正整数)。当 $\mathbf{x} = x_1 x_2 \cdots x_{2m}$ 的每两个字母转换为 $\mathbf{y} = y_1 \cdots y_m$ 的一个字母时,使用以下规则,计算 $\mathbf{y}$ 中 $a, c, g, t$ 的期望比例,当 $m \to \infty$ 时。

$$
y_i =
\begin{cases}
a, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 0, \\
c, & \text{如果 } x_{2i-1} = 0 \text{ 且 } x_{2i} = 1, \\
g, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 0, \\
t, & \text{如果 } x_{2i-1} = 1 \text{ 且 } x_{2i} = 1, \\
\end{cases}
\quad \text{对于 } i = 1, \ldots, m
$$

## **Kai**
### (1)

To find the distribution function of the smallest order statistic $\mathbf{X_{(1)}}$, we consider:

$$
F_{\mathbf{X_{(1)}}}(x) = P(\mathbf{X_{(1)}} \leq x).
$$

Since $\mathbf{X_{(1)}}$ is the smallest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(1)}} \gt x$ means that all $\mathbf{X_i} \gt x$. Thus:

$$
F_{\mathbf{X_{(1)}}}(x) = 1 - P(\mathbf{X_{(1)}} > x).
$$

We know that $\mathbf{X_{(1)}} > x$ if and only if all $\mathbf{X_i} > x$, so:

$$
P(\mathbf{X_{(1)}} > x) = P(\mathbf{X_1} > x, \mathbf{X_2} > x, \ldots, \mathbf{X_n} > x) = \left( P(\mathbf{X_1} > x) \right)^n = (1 - F(x))^n.
$$

Thus, the distribution function of $\mathbf{X_{(1)}}$ is:

$$
F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - F(x))^n.
$$

### (2)

To find the distribution function of the largest order statistic $\mathbf{X_{(n)}}$, we consider:

$$
F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_{(n)}} \leq x).
$$

Since $\mathbf{X_{(n)}}$ is the largest of the $\mathbf{X_1}, \ldots, \mathbf{X_n}$, $\mathbf{X_{(n)}} \leq x$ means that at least one $\mathbf{X_i} \leq x$. Thus:

$$
F_{\mathbf{X_{(n)}}}(x) = P(\mathbf{X_1} \leq x, \mathbf{X_2} \leq x, \ldots, \mathbf{X_n} \leq x) = \left( P(\mathbf{X_1} \leq x) \right)^n = (F(x))^n.
$$

### (3)

To find the distribution function of the $k$-th order statistic $\mathbf{X_{(k)}}$, we need to determine the probability $F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x)$. This represents the probability that the $k$-th smallest value among $\mathbf{X_1}, \ldots, \mathbf{X_n}$ is less than or equal to $x$.

#### Step 1: Basic Concepts and Binomial Probability

Since $\mathbf{X_i}$ are independent and identically distributed, the probability that any particular $\mathbf{X_i}$ is less than or equal to $x$ is $F(x)$. Similarly, the probability that $\mathbf{X_i}$ is greater than $x$ is $1 - F(x)$.

#### Step 2: Using Binomial Distribution

We can think of this as a binomial distribution problem. We need to consider the event that at least $k$ out of $n$ $\mathbf{X_i}$ values are less than or equal to $x$. Mathematically, this can be expressed as:

$$
F_{\mathbf{X_{(k)}}}(x) = P(\mathbf{X_{(k)}} \leq x) = \sum_{j=k}^{n} \binom{n}{j} (F(x))^j (1 - F(x))^{n-j}.
$$

Here, $\binom{n}{j}$ is the binomial coefficient, representing the number of ways to choose $j$ successes (values $\leq x$) out of $n$ trials.

### (4)

If $F(x)$ is the uniform distribution over $[0,1]$, then $F(x) = x$ for $x \in [0,1]$. Therefore:

$$
F_{\mathbf{X_{(1)}}}(x) = 1 - (1 - x)^n.
$$

The expectation of $\mathbf{X_{(1)}}$ is given by:

$$
\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x f_{\mathbf{X_{(1)}}}(x) \, \mathrm{d}x,
$$

where $f_{\mathbf{X_{(1)}}}(x)$ is the derivative of $F_{\mathbf{X_{(1)}}}(x)$:

$$
f_{\mathbf{X_{(1)}}}(x) = \frac{\mathrm{d}}{\mathrm{dx}} \left[ 1 - (1 - x)^n \right] = n (1 - x)^{n-1}.
$$

Therefore:

$$
\mathbb{E}[\mathbf{X_{(1)}}] = \int_{0}^{1} x \cdot n (1 - x)^{n-1} \, \mathrm{d}x.
$$

This is a Beta distribution integral:

$$
\mathbb{E}[\mathbf{X_{(1)}}] = n \int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x.
$$

Using the Beta function property, we get:

$$
\int_{0}^{1} x (1 - x)^{n-1} \, \mathrm{d}x = \frac{1}{n+1}.
$$

Thus:

$$
\mathbb{E}[\mathbf{X_{(1)}}] = \frac{n}{n+1}.
$$

## **Knowledge**

#顺序统计量 #概率分布函数 #期望值 #Beta分布

### 难点思路

第 3 小问关于任意 $k$ 阶顺序统计量的分布函数需要理解 Binomial 分布的性质并进行累加,这是一个较难点。

### 解题技巧和信息

对于顺序统计量,了解如何通过分布函数 $F(x)$ 来表示最小和最大顺序统计量的分布函数非常重要。对于均匀分布的情况,可以利用 Beta 分布性质简化期望值计算。

### 重点词汇

- order statistic 顺序统计量
- distribution function 分布函数
- expectation 期望值
- uniform distribution 均匀分布

### 参考资料

1. "Probability and Statistics" by Morris H. DeGroot and Mark J. Schervish, Chapter 5.
2. "A First Course in Probability" by Sheldon Ross, Chapter 8.
1 change: 1 addition & 0 deletions docs/kakomonn/tokyo_university/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ tags:
- [8月 問題8](frontier_sciences/cbms_201808_8.md)
- [8月 問題9](frontier_sciences/cbms_201808_9.md)
- [8月 問題10](frontier_sciences/cbms_201808_10.md)
- [8月 問題11](frontier_sciences/cbms_201808_11.md)
- 海洋技術環境学専攻:
- 2022年度:
- [第1~6問](frontier_sciences/otpe_2022_all.md)
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,7 @@ nav:
- 8月 問題8: kakomonn/tokyo_university/frontier_sciences/cbms_201808_8.md
- 8月 問題9: kakomonn/tokyo_university/frontier_sciences/cbms_201808_9.md
- 8月 問題10: kakomonn/tokyo_university/frontier_sciences/cbms_201808_10.md
- 8月 問題11: kakomonn/tokyo_university/frontier_sciences/cbms_201808_11.md
- 海洋技術環境学専攻:
- 2022年度:
- 第1~6問: kakomonn/tokyo_university/frontier_sciences/otpe_2022_all.md
Expand Down

0 comments on commit 3449722

Please sign in to comment.