Skip to content

Commit 49be5ff

Browse files
authored
divers - IA (#79)
1 parent 26b8d5d commit 49be5ff

File tree

488 files changed

+865
-752
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

488 files changed

+865
-752
lines changed

content/.vuepress/sidebar-config/divers.js

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ function getDivers() {
2222
'freebox/freebox-videos'
2323
]
2424
},
25+
{
26+
title: 'IA',
27+
collapsable: false,
28+
sidebarDepth: 2,
29+
children: [
30+
'ia/ia'
31+
]
32+
},
2533
{
2634
title: 'Licences',
2735
collapsable: false,

content/divers/ia/ia.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# IA
2+
3+
## training
4+
5+
[Yann LeCun about IA training on LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7133567569684238336/)
6+
7+
```text
8+
Animals and humans get very smart very quickly with vastly smaller amounts of training data than current AI systems.
9+
10+
Current LLMs are trained on text data that would take 20,000 years for a human to read.
11+
And still, they haven't learned that if A is the same as B, then B is the same as A.
12+
Humans get a lot smarter than that with comparatively little training data.
13+
Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion "parameters."
14+
15+
My money is on new architectures that would learn as efficiently as animals and humans.
16+
Using more text data (synthetic or not) is a temporary stopgap made necessary by the limitations of our current approaches.
17+
The salvation is in using sensory data, e.g. video, which has higher bandwidth and more internal structure.
18+
19+
The total amount of visual data seen by a 2 year-old is larger than the amount of data used to train LLMs, but still pretty reasonable.
20+
2 years = 2x365x12x3600 or roughly 32 million seconds.
21+
We have 2 million optical nerve fibers, carrying roughly ten bytes per second each.
22+
That's a total of 6E14 bytes. The volume of data for LLM training is typically 1E13 tokens, which is about 2E13 bytes.
23+
It's a factor of 30.
24+
25+
Importantly, there is more to learn from video than from text because it is more redundant.
26+
It tells you a lot about the structure of the world.
27+
```
28+
29+
TLDR : Next gen IA needs to use video instead of text.
30+
31+
To compare, see [this Jean-Baptiste Kempf (VLC) interview about how video works](https://www.youtube.com/watch?v=Kv4FzAdxclA).
32+
33+
- an image is an array of pixel, each pixel is a color
34+
- a video is a collection of images (something between 24 to 60 images per second)
35+
- CODEC = compression decompression algorithm to send video.
36+
- Video pixel by pixel is around 10 to 40 Gb/s
37+
- the goal of CODEC is to divide 100, 200, ... 1K the bandwith used.
38+
- dividing bandwith is destroying information
39+
- the tech behind is based on how the human eyes behave, some colors are better seen then others, so we can delete some colors without downgrading the image seen.
40+
41+
Each CODEC behave the same way, they delete data not seen by eyes, and they seek data blocks that are redundant image by image or between images.
42+
43+
```text
44+
MPEG-1 (1993) ---> MPEG-2 (1995) = DVD ---> DIVX (1999) (=MPEG-4) ---> H.264 (2003) ---> HEVC (2013) ---> VP9 (2013)
45+
```
46+
47+
- H.264 is the most common CODEC used in the world, around 80% of usage.
48+
- HEVC is crippled by royalties, it remains unused on the web instead of television, around 5%.
49+
- VP9 created by Google, royalty free, opensource, Youtube and Facebook uses it.
50+
- AV1 then AV2 created by the Open Media Alliance initiated by Google.
51+
- AV1 is implemented by [Dav1d](https://github.com/videolan/dav1d), a VLC project, around 210K assembly LoC + 30K C LoC. This impl is widely used by GAFAM.
52+
53+
## misc
54+
55+
[Guide ChatGPT pour développeurs](https://gen-ai.fr/outils/generation-code/chatgpt-pour-developpeurs/)

docs/404.html

Lines changed: 3 additions & 3 deletions
Large diffs are not rendered by default.

docs/assets/js/100.6a2b43d6.js

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/assets/js/100.44086804.js renamed to docs/assets/js/101.c2c7bbe7.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)