|
| 1 | +# IA |
| 2 | + |
| 3 | +## training |
| 4 | + |
| 5 | +[Yann LeCun about IA training on LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7133567569684238336/) |
| 6 | + |
| 7 | +```text |
| 8 | +Animals and humans get very smart very quickly with vastly smaller amounts of training data than current AI systems. |
| 9 | +
|
| 10 | +Current LLMs are trained on text data that would take 20,000 years for a human to read. |
| 11 | +And still, they haven't learned that if A is the same as B, then B is the same as A. |
| 12 | +Humans get a lot smarter than that with comparatively little training data. |
| 13 | +Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion "parameters." |
| 14 | +
|
| 15 | +My money is on new architectures that would learn as efficiently as animals and humans. |
| 16 | +Using more text data (synthetic or not) is a temporary stopgap made necessary by the limitations of our current approaches. |
| 17 | +The salvation is in using sensory data, e.g. video, which has higher bandwidth and more internal structure. |
| 18 | +
|
| 19 | +The total amount of visual data seen by a 2 year-old is larger than the amount of data used to train LLMs, but still pretty reasonable. |
| 20 | +2 years = 2x365x12x3600 or roughly 32 million seconds. |
| 21 | +We have 2 million optical nerve fibers, carrying roughly ten bytes per second each. |
| 22 | +That's a total of 6E14 bytes. The volume of data for LLM training is typically 1E13 tokens, which is about 2E13 bytes. |
| 23 | +It's a factor of 30. |
| 24 | +
|
| 25 | +Importantly, there is more to learn from video than from text because it is more redundant. |
| 26 | +It tells you a lot about the structure of the world. |
| 27 | +``` |
| 28 | + |
| 29 | +TLDR : Next gen IA needs to use video instead of text. |
| 30 | + |
| 31 | +To compare, see [this Jean-Baptiste Kempf (VLC) interview about how video works](https://www.youtube.com/watch?v=Kv4FzAdxclA). |
| 32 | + |
| 33 | +- an image is an array of pixel, each pixel is a color |
| 34 | +- a video is a collection of images (something between 24 to 60 images per second) |
| 35 | +- CODEC = compression decompression algorithm to send video. |
| 36 | +- Video pixel by pixel is around 10 to 40 Gb/s |
| 37 | +- the goal of CODEC is to divide 100, 200, ... 1K the bandwith used. |
| 38 | +- dividing bandwith is destroying information |
| 39 | +- the tech behind is based on how the human eyes behave, some colors are better seen then others, so we can delete some colors without downgrading the image seen. |
| 40 | + |
| 41 | +Each CODEC behave the same way, they delete data not seen by eyes, and they seek data blocks that are redundant image by image or between images. |
| 42 | + |
| 43 | +```text |
| 44 | +MPEG-1 (1993) ---> MPEG-2 (1995) = DVD ---> DIVX (1999) (=MPEG-4) ---> H.264 (2003) ---> HEVC (2013) ---> VP9 (2013) |
| 45 | +``` |
| 46 | + |
| 47 | +- H.264 is the most common CODEC used in the world, around 80% of usage. |
| 48 | +- HEVC is crippled by royalties, it remains unused on the web instead of television, around 5%. |
| 49 | +- VP9 created by Google, royalty free, opensource, Youtube and Facebook uses it. |
| 50 | +- AV1 then AV2 created by the Open Media Alliance initiated by Google. |
| 51 | +- AV1 is implemented by [Dav1d](https://github.com/videolan/dav1d), a VLC project, around 210K assembly LoC + 30K C LoC. This impl is widely used by GAFAM. |
| 52 | + |
| 53 | +## misc |
| 54 | + |
| 55 | +[Guide ChatGPT pour développeurs](https://gen-ai.fr/outils/generation-code/chatgpt-pour-developpeurs/) |
0 commit comments