Skip to content

Commit 752ea28

Browse files
committed
update image
1 parent fd91225 commit 752ea28

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

机器学习/291.机器学习简介: 寻找函数的艺术.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## 机器学习就是找函数
88

9-
<img width=300 src="https://private-user-images.githubusercontent.com/7970947/303656877-67390915-f4a9-464d-a712-9958ffbf6703.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODM2ODIsIm5iZiI6MTcwNzQ4MzM4MiwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1Njg3Ny02NzM5MDkxNS1mNGE5LTQ2NGQtYTcxMi05OTU4ZmZiZjY3MDMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTI1NjIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWM3NWVhYTZhOTU0MTM2YTFlM2M1MTQzZDg2M2UyZGJkNjkzMDU4MGZlMmRlZGJlNjMzMmM5ODMyNDc3NjNlOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.ppU_jvck--6Zd8_tHkVPj2plOvVQ5ywVowPjBz0wroI">
9+
<img width=300 src="https://github.com/ascoders/blog/assets/7970947/67390915-f4a9-464d-a712-9958ffbf6703">
1010

1111
以我对机器学习的理解,认为其本质就是 **找函数**。我需要从两个角度解释,为什么机器学习就是找函数。
1212

@@ -69,7 +69,7 @@ define model function 就是定义函数,这可不是一步到位定义函数
6969

7070
假设我们定义一个简单的一元一次函数:
7171

72-
<img width=110 src="https://private-user-images.githubusercontent.com/7970947/303662928-167fac7d-ee5e-4aaa-b04e-e6f27b6ecda6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODUyNzQsIm5iZiI6MTcwNzQ4NDk3NCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2MjkyOC0xNjdmYWM3ZC1lZTVlLTRhYWEtYjA0ZS1lNmYyN2I2ZWNkYTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTMyMjU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDY2MTRiODE1YWE1NDc2ZTc1NmE0NDFiYzI0NWRiMzFjN2RiMTdhNDViNDZjMzM4NDkwMTg4ZmZkOWU0OWY0YiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.nQN0NNCXZ9SWnBz6K0ZcoRdzIKaGx54GGbSHkMNHkSo">
72+
<img width=110 src="https://github.com/ascoders/blog/assets/7970947/167fac7d-ee5e-4aaa-b04e-e6f27b6ecda6">
7373

7474
其中未知参数是 w 和 b,也就是我们假设最终要找的函数可以表示为 b + wx,但具体 w 和 b 的值是多少,是需要寻找的。我们可以这么定义:
7575

@@ -91,7 +91,7 @@ define loss function 就是定义损失函数,这个损失可以理解为距
9191

9292
有很多种方法定义 loss 函数,一种最朴素的方法就是均方误差:
9393

94-
<img width=190 src="https://private-user-images.githubusercontent.com/7970947/303665423-2a6f8788-bb55-4072-89ea-c3f5db940f87.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODU4OTAsIm5iZiI6MTcwNzQ4NTU5MCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2NTQyMy0yYTZmODc4OC1iYjU1LTQwNzItODllYS1jM2Y1ZGI5NDBmODcucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTMzMzEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YmViZDQ5OTdkYWVjN2EzMGY1YmY1ZjE3ZDRhYzU1MzMzYTQ3OWJjMmE0ZjEwMWYyYjExMTk3Y2M3M2FmNzhjMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.a4yVUqCp9YGNzjgEgoU8uWAGxs3-cm-17HExe1TiFbg">
94+
<img width=190 src="https://github.com/ascoders/blog/assets/7970947/2a6f8788-bb55-4072-89ea-c3f5db940f87">
9595

9696
即计算当前实际值 `modelFunction(b,w)(x)` 与目标值 `3x` 的平方差。那么 loss 函数可以这样定义:
9797

@@ -121,19 +121,19 @@ optimization 就是优化函数的参数,使 loss 函数值最小。
121121

122122
而寻找 loss function 的最小值,需要不断更新未知参数,如果把 loss 函数画成一个函数图像,我们想让函数图像向较低的值走,就需要对当前值求偏导,判断参数更新方向:
123123

124-
<img width=300 src="https://private-user-images.githubusercontent.com/7970947/303600726-6e3544a2-c4b6-4874-b1be-d3969207406b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0NjkyODcsIm5iZiI6MTcwNzQ2ODk4NywicGF0aCI6Ii83OTcwOTQ3LzMwMzYwMDcyNi02ZTM1NDRhMi1jNGI2LTQ4NzQtYjFiZS1kMzk2OTIwNzQwNmIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMDg1NjI3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NGExZTkwYjQ3YjhiNWE4ZTdmNTcwMzZiNDk1MjIwYzdlZDkzNDcwOGUzYTlkN2QxYzQ5MjM3OTU3NzA5ZTY0OSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.P9h5W0FD_q76Ih_V0rkkn4UjFCjxS1Edi4lLAwJrzzs">
124+
<img width=300 src="https://github.com/ascoders/blog/assets/7970947/6e3544a2-c4b6-4874-b1be-d3969207406b">
125125

126126
如上图所示,假设上图的 x 轴是参数 w,y 轴是此时所有 training data 得到的 loss 值,那么只要对 loss 函数做 w 的偏导,就能知道 w 要怎么改变,可以让 loss 变得更小(当偏导数为负数时,右移,即 w 增大可以使 loss 减小,反之亦然)。
127127

128128
根据 loss function 的定义,我们可以分别写出 loss function 对参数 b 与 w 的偏导公式:
129129

130130
对 b 偏导:
131131

132-
<img width=340 src="https://private-user-images.githubusercontent.com/7970947/303667829-d5ea9819-2f33-4ea6-88b6-0d1d5beba31e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODY0NzEsIm5iZiI6MTcwNzQ4NjE3MSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2NzgyOS1kNWVhOTgxOS0yZjMzLTRlYTYtODhiNi0wZDFkNWJlYmEzMWUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTM0MjUxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWZhMGQ1YzNiMjRjM2RmOTJiMjVlYTkyYjhlNzcxMGJlZDk2YjllZTFjZTY2MjgzNDA2ZjhmODMwYmZhYzI3MCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.EQo5hX9CMcvd6kWheBF3jpyuuTBCX8DPpVCWIsMcDP4">
132+
<img width=340 src="https://github.com/ascoders/blog/assets/7970947/d5ea9819-2f33-4ea6-88b6-0d1d5beba31e">
133133

134134
对 w 偏导:
135135

136-
<img width=360 src="https://private-user-images.githubusercontent.com/7970947/303667857-92ef497d-6e94-4c98-ac23-476aa31642fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODY0NzEsIm5iZiI6MTcwNzQ4NjE3MSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2Nzg1Ny05MmVmNDk3ZC02ZTk0LTRjOTgtYWMyMy00NzZhYTMxNjQyZmUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTM0MjUxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Yjg4ZDI5OGRhNjQ3ZWQ2ZDZkZDVkOWViODcyYWU5MGVlZTY1Nzc2Yzg5MjMzYmNkMDM1NzIzMGJkNzg2MjU4NSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.boyXK17UIbV2fiRx-n2C0Y9ZaznM6k3vsyhZsLcZj5A">
136+
<img width=360 src="https://github.com/ascoders/blog/assets/7970947/92ef497d-6e94-4c98-ac23-476aa31642fe">
137137

138138
> 注意,这里仅计算针对某一个 training data 的偏导数,而不用把所有 training data 的偏导数结果加总,因为后续如何利用这些偏导数还有不同的策略。
139139
@@ -203,7 +203,7 @@ for (let i = 0; i < 500; i++) {
203203

204204
把函数寻找过程可视化,就形成了下图:
205205

206-
<img width=500 src="https://private-user-images.githubusercontent.com/7970947/303650736-fa3cb64a-426c-4bec-a6f9-674ba84ec6e6.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODIyMTksIm5iZiI6MTcwNzQ4MTkxOSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1MDczNi1mYTNjYjY0YS00MjZjLTRiZWMtYTZmOS02NzRiYTg0ZWM2ZTYuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTIzMTU5WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTcxZDY2NDY0OTBmMmMyZjAyMzI3OWRlOWExZjdiNTc4YzRlOGI4NzRlNGJjYzljM2Y3YmUzYjQ2MzdkYzRjMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.AEU4WivgSPXQVsgu2889U69U8SMT9TGwiwSQFtvMB1c">
206+
<img width=500 src="https://github.com/ascoders/blog/assets/7970947/fa3cb64a-426c-4bec-a6f9-674ba84ec6e6">
207207

208208
可以发现,无论初始值参数 b 和 w 怎么选取,最终 loss 收敛时,b 都会趋近于 0,而 w 趋近于 3,即无限接近 y=3x 这个函数。
209209

@@ -217,6 +217,6 @@ for (let i = 0; i < 500; i++) {
217217

218218
也许你已经发现,我们设定的 y = b + wx 的函数架构太过于简单,它只能解决线性问题,我们只要稍稍修改 training data 让它变成非线性结构,就会发现 loss 小到某一个值后,就再也无法减少了。通过图可以很明显的发现,不是我们的 define loss function 或者 optimization 过程有问题,而是 define model function 定义的函数架构根本就不可能完美匹配 training data:
219219

220-
<img width=500 src="https://private-user-images.githubusercontent.com/7970947/303651368-7625f0f3-2fc0-49f1-8638-0c9b9bb3cd76.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODM0NDgsIm5iZiI6MTcwNzQ4MzE0OCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1MTM2OC03NjI1ZjBmMy0yZmMwLTQ5ZjEtODYzOC0wYzliOWJiM2NkNzYuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTI1MjI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NTJmNGYyY2YwYzM0OGZjMmI3ZTU4ZjQ3ZTk1OWYxOTJmODI0MDIzNWU1ZDZlNzQ1OWNjYzZmMTliZjY2NzZlOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.1soHCDrRdXc5aklFAYnAwOH4emQvYcveW8c30LLnznI">
220+
<img width=500 src="https://github.com/ascoders/blog/assets/7970947/7625f0f3-2fc0-49f1-8638-0c9b9bb3cd76">
221221

222222
这种情况称为 model bias,此时我们必须升级 model function 的复杂度,升级复杂度后的函数却很难 train 起来,由此引发了一系列解决问题 - 发现新问题 - 再解决新问题的过程,这也是机器学习的发展史,非常精彩,而且读到这里如果你对接下来的挑战以及怎么解决这些挑战非常感兴趣,你就具备了入门机器学习的基本好奇心,我们下一篇就来介绍,如何定义一个理论上能逼近一切实现的函数。

0 commit comments

Comments
 (0)