This repository is a mini-course for beginners on AI concepts and terms, foundational AI and Machine Learning-related research papers and frameworks, some helpful video tutorials, and a comprehensive AI-Engineer roadmap:
- A Glossary of Basic AI Terms and Concepts
- Videos about AI Concepts and Terms
- Important Research Papers on AI and Machine Learning
- AI Roadmap
Neural Network is a network of neurons forming two or more layers. Think of the neurons as receiving points of values, with two specific values called "weight" and "bias". Input values are passed to the first layer where the output of each neuron is a weighted sum of its inputs, plus the bias. The output is passed on to the neurons in the next layer and so on, towards the final layer of the network, where the final output or the decision is received.
Large Language Models are predictive models generated by various neural network methods to predict the next word (or token) in a text sequence (e.g., a sentence).
Optimizer is an algorithm that adjusts the neurons' weights and biases as the parameters of a neural network. The optimisation of parameters is meant to minimise the loss function during the training stage.
Fine-tuning is the process of making new information accessible to a generative model by updating its weights via training the layers closer to the output.
In Context Learning or ICL is an alternative to fine-tuning whereby the user provides the generative model with additional examples of the desired response at the prompt. ICL does not require any model training or parameter tuning as done with fine-tuning.
RAG or the retrieval augmented generation method is another alternative to fine-tuning whereby the user provides the pre-trained model with additional information from external sources of data, e.g., a PDF file, spreadsheet data, a web link, or even a code repository.
Backpropagation algorithm in most neural network architectures uses a method (e.g., the chain rule bundled with a gradient descent algorithm) to backpropagate or move backwards the error term (difference between the prediction and the expected outcome) to adjust weights and biases in a way to reduce the Loss/Cost function so that the network can obtain a more accurate response.
Natural Language Processing (NLP) has been traditionally a branch of computational linguistics that uses various machine learning algorithms to process and analyse texts and corpora (plural of corpus, a collection of texts) by finding patterns in natural language or human-generated language (both texts and transcribed speech) with the underlying structure of languages that makes meaning. With the advancement of Artificial Intelligence (AI), generative models add to the dimensionality of NLP leading to natural language generation that has become the core method in chatbots' textual/conversational ability as well as generative models for translation, sentiment analysis, and speech recognition.
Data Augmentation is a technique used in neural network design to artificially increase the amount of data to be trained by the model by changing some aspects of the original input data. For example, in the context of image classifiers such as CNN (Convolutional Neural Network), the data augmentation process is done by shifting, rotating, flipping, and resizing the original input images.
These ML- AI Papers/Algorithms Will Get You Ahead Of 99% in AI
Transformer Language Models Simplified in JUST 3 MINUTES!
Beginner's guide to how language models work: full history of language models
Best LLM? How to Evaluate Language Models In Hugging Face
The Concept of Backpropagation Simplified in JUST 2 MINUTES! --Neural Networks
Mamba Language Model Simplified In JUST 5 MINUTES!
Is Mamba LLM Destroying Transformers? Language Model Comparison in AI
What Is An Agentic Workflow? Six Main Systems For AI Agents
AI Terms and Concepts Explained!
A collection of the most important AI and Machine Learning (ML) papers, concepts, and algorithms you need to get started in your AI journey. The full list of References is included at the end of this repository.
This is Transformers simplified for beginners, based on the journal article ‘Attention Is All You Need’ 2017 by Google researchers.
Main terms and concepts: Attention mechanism, contextual understanding, encoder and decoder, multi-head, positional encoding, self-attention.
This is LLM simplified for beginners: the fascinating journey of the evolution of language models, from rule-based models to statistical models, neural networks and transformers.
Main terms and concepts: probability of the next word, word sequences, rule-based algorithms, deterministic algorithms, contextual understanding, probabilistic algorithm, chain rule, natural language processing.
A full explanation of the Hugging Face LLM evaluation Scheme and tests: Measuring Massive Multitask Language Understanding (MMLU), AI2 Reasoning Challenge (ARC), the TruthfulQA which measures whether a language model is truthful in generating answers to questions, HellaSwag which challenges LLMs with a new task of commonsense natural language inference, Winogrande as a test of pronoun resolution problems, and the GSM8k multi-step test of mathematical reasoning.
A beginner and easy-to-follow explanation of Backpropagation in Neural Networks, and how it helps to reduce the error in predicting the next word in a sequence in a text.
A super simplified explanation of the Mamba language model with the Selective State Space Model (Selective SSM architecture). It shows how Mamba’s AI architecture uses the Selective State Space Model to figure out which parts of the data. e.g., which words in a word sequence, are connected and how they might affect what happens next, e.g., to predict which word comes next.
The model architectures and performance differences of the Transformer and Mamba language models. I will compare the functionalities of the main AI and machine learning models, and show the necessary improvements in the Mamba AI model compared to its Recurrent Neural Networks predecessor such as Long Short Term Memory or LSTM, and Gated networks.
Researchers in artificial intelligence believe that the path to AGI or Artificial General Intelligence is by leveraging the Agentic AI workflow. But what exactly is an agentic AI workflow and how can we use it in our daily tasks and business decision-making?
Key terms and concepts: multi-agent, self-refine, refinement method, Reflexion method, verbal reinforcement learning, HuggingGPT, chain of thought prompting, AutoGen, retrieval-augmented chat, Multi-agent coding
Retrieval Augmented Generation (RAG) vs In-Context-Learning (ICL) vs Fine-Tuning LLMs: Prompt engineering, information retrieval, natural language processing
Here's a list of foundational research papers in Machine Learning (ML) and AI:
David Rumelhart, Geoffrey Hinton, and Ronald Williams. 1986. Learning Representations by Back-propagating Errors. Nature.
Hinton, 2022. The Forward-Forward Algorithm: Some Preliminary Investigations.
Aytekin, et al., 2022. Neural Networks are Decision Trees. https://arxiv.org/pdf/2210.05189
Schmidhuber, 2014. Deep Learning in Neural Networks: An Overview. https://arxiv.org/pdf/1404.7828
Dosovitskiy, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale.
Beltagy et al. 2024. Longformer: The Long-Document Transformer.
Wang et al., 2020. Linformer: Self-Attention with Linear Complexity
Zaheer et al. 2021. Big Bird: Transformers for Longer Sequences.
Open Ai. 2023. GPT-4 Technical Report.
Dai et al. 2019.Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
Fukushima, K. and Miyake, S., 1982. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets (pp. 267–285). Springer, Berlin, Heidelberg.
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278–2324.
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), pp.84–90.
C. Szegedy et al., “Going deeper with convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594.
He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Schonfeld et al., 2020. A U-Net Based Discriminator for Generative Adversarial Networks.
Hu et al., 2021. LoRA: Low-Rank Adaptation for Large Language Models
Lewis et al., 2021. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. https://arxiv.org/pdf/2005.11401
Xue et al. 2024. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models.
Jiang, et al. 2024. Many-Shot In-Context Learning in Multimodal Foundation Models.
Dosovitskiy, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale.
Hollein et al. 2024. ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models.
Crowson et al. 2024. Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers.
Gupta et al. 2023. Photorealistic Video Generation with Diffusion Models.
Dehghani et al. 2023. Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution.
Ho et al. 2020. Denoising Diffusion Probabilistic Models.
Chen et al. 2020. Generative Pretraining from Pixels. Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119.
Chhabra et al. 2019. Data Fine-Tuning. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).
Ma et al. 2024. SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Croitoru et al. 2023. Diffusion Models in Vision: A Survey.
Blattmann et al. 2023. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models.
Bao et al. 2022. Why Are Conditional Generative Models Better Than Unconditional Ones?
Zhu & Zhao. 2023. Diffusion Models in NLP: A Survey.
Jiang et al. 2024. Investigating Data Contamination for Pre-training Language Models.
Jabri et al. 2023. Scalable Adaptive Computation for Iterative Generation.