This repository presents some Machine Learning concepts acquired from my AI Engineering Professional Certificate specialization course.
I have completed the following course:
conda create -n ml_env python=3.12
conda activate ml_env
Artificial Intelligence (AI) is a branch of computer science focused on creating systems capable of simulating aspects of human intelligence. This includes activities such as learning, reasoning, perception, language comprehension, and problem-solving.
AI is based on algorithms and mathematical models that allow machines to analyze data, recognize patterns, and make decisions autonomously or with human assistance.
AI can be divided into three main categories based on the level of autonomy and reasoning capability.
This is the current form of AI, designed to perform specific tasks. It does not have general intelligence like a human.
- Examples: Voice assistants (Siri, Alexa), recommendation engines (Netflix, Spotify), chatbots, facial recognition.
A hypothetical AI that would be capable of thinking and learning like a human autonomously.
- Examples: It does not exist yet, but it would be an AI capable of adapting to any intellectual task without requiring specific training for each.
A theoretical level where AI would surpass human intelligence in all fields, including creativity, problem-solving, and intuition.
- Examples: This does not yet exist but is the basis of many theories about AI’s future.
AI is a vast field that encompasses many disciplines. Here are the primary ones:
Machine Learning is a subfield of AI that allows machines to learn from data without explicit programming.
- Main types of ML:
- Supervised Learning → AI learns from labeled data (e.g., image recognition with predefined descriptions).
- Unsupervised Learning → AI finds patterns in data without human guidance (e.g., user clustering on streaming platforms).
- Reinforcement Learning → AI improves its decisions by receiving rewards or penalties (e.g., AI in games like AlphaGo).
Allows AI to understand, analyze, and generate text or speech, enabling human-machine interaction.
- Examples: ChatGPT, automatic translators, sentiment analysis in social media.
Enables AI to analyze images and videos, recognizing objects, faces, or scenes.
- Examples: Facial recognition, AI-powered medical diagnostics, self-driving cars.
Allows AI to create text, images, music, and videos autonomously.
- Examples:
- Text: GPT-4 for automated writing.
- Images: DALL·E, Stable Diffusion for creating digital art.
- Music: OpenAI’s Jukebox for generating music tracks.
- Video: AI models capable of generating realistic videos from scratch.
AI is applied to robotics to create machines capable of moving and interacting with the real world.
- Examples: Humanoid robots (Boston Dynamics), industrial robotic arms, Mars exploration robots.
AI operates through three main phases:
- Data Collection → AI needs data to learn (texts, images, sounds, etc.).
- Model Training → An algorithm analyzes data and looks for patterns to make predictions or decisions.
- Application and Optimization → The model is applied in the real world and continuously improved.
AI also presents ethical challenges and risks, including:
- Bias and Discrimination → If AI is trained on biased data, it can produce unfair outcomes (e.g., facial recognition accuracy varying by ethnicity).
- Job Loss → Automation may replace certain human jobs.
- Privacy and Security → AI can collect and use personal data in intrusive ways.
- Manipulation and Fake News → Generative AI can be used to create misleading or false content.
AI will continue to evolve and impact more sectors, including:
- Healthcare → More accurate diagnostics with AI.
- Industrial Automation → Autonomous robots for manufacturing.
- Entertainment → AI in video games, movies, and music.
- Sustainability → AI optimizing energy consumption and reducing pollution.
AI is a powerful, revolutionary, and continuously evolving technology, shaping the modern world.
A concise overview of Artificial Intelligence (AI) with its main categories and subcategories.
AI is a field of computer science that develops systems capable of simulating human cognitive processes.
- Weak AI (Narrow AI) → Specialized in specific tasks (e.g., voice assistants, chatbots).
- Strong AI (General AI) → A theoretical AI capable of learning and adapting like a human.
- Super AI → A hypothetical form of AI that surpasses human intelligence.
AI that learns from data without explicit programming.
- Supervised Learning → Learns from labeled data.
- Unsupervised Learning → Finds patterns in unlabeled data.
- Reinforcement Learning → Learns through rewards and penalties.
An advanced subset of ML that uses deep neural networks.
- Feedforward Neural Networks (FNN) → Basic structure for pattern recognition.
- Convolutional Neural Networks (CNN) → Used for computer vision (image analysis).
- Recurrent Neural Networks (RNN) → Ideal for sequential data (text, audio).
- Transformers → Evolution of RNNs, used for NLP (ChatGPT, BERT).
AI for understanding and generating human language.
- Examples: Chatbots, automatic translation, sentiment analysis.
AI that analyzes images and videos.
- Examples: Facial recognition, self-driving cars, medical diagnostics.
AI that creates new content (text, images, music, video).
- Examples: DALL·E (images), GPT (text), Jukebox (music).
AI applied to robots and autonomous systems.
- Examples: Industrial robots, self-driving vehicles.
AI based on predefined logical rules, without learning.
- Examples: Medical diagnosis systems using symptom databases.
AI that processes data directly on devices, without relying on the cloud.
- Examples: Offline voice assistants, AI in smartphones.
- Bias and discrimination in data.
- Privacy and security in data usage.
- AI ethics and regulatory concerns.
- Automation and its impact on jobs.
AI is a vast field with many specializations. Machine Learning and Deep Learning are fundamental, but other crucial areas include NLP, Computer Vision, and Generative AI.
- Activation Function: A mathematical function that introduces non-linearity into the model, allowing it to solve complex problems.
- Adam Optimizer: An advanced optimization algorithm that improves gradient descent.
- Algorithm: A set of step-by-step instructions for solving a problem or making a prediction in machine learning.
- Argmax: A function that returns the index of the maximum value in a set. In multi-class classification, it determines the class with the highest probability.
- Artificial Neuron: A mathematical model inspired by biological neurons, used in artificial neural networks.
- Autoencoder: An unsupervised neural network that learns to compress and decompress data without human intervention.
- Automatic Translation: The use of neural networks to translate text between different languages.
- Average-Pooling: Computes the average value within a region of the image.
- Backpropagation: An algorithm that optimizes weights and biases by correcting the model’s errors.
- BatchNormalization (Normalization): Technique that improves training stability and speed by normalizing the output of layers to have a mean of zero and a variance of one, applied during both training and inference.
- Bagging (Bootstrap Aggregating): An ensemble learning method that reduces variance by training multiple models on random subsets of data and averaging their predictions. Used in Random Forests.
- Bias: Systematic error of the model, indicating how far the predictions are from the actual values. High bias causes underfitting.
- Bias (b): A constant added to the equation to improve the model’s learning ability.
- Binary Classifier: A model that distinguishes between only two classes (e.g., positive/negative). It serves as the basis for multi-class strategies like One-vs-All and One-vs-One.
- Boosting: An ensemble learning technique that reduces bias by sequentially training models, each correcting the errors of the previous one. Used in XGBoost, AdaBoost, Gradient Boosting.
- Centroid: The central point of a cluster.
- Categorical Cross-Entropy: A loss function used for multi-class classification.
- Churn Prediction: The process of predicting whether a customer will abandon a service or subscription.
- Classes: The possible outcomes or output categories predicted by the model.
- Class Imbalance: When some classes are much more frequent than others in the dataset, influencing predictions.
- Classification: Predicting which category a piece of data belongs to by assigning it a discrete label.
- Classification with KNN: A method that assigns the class based on the majority vote of the K nearest neighbors.
- Clustering: An unsupervised learning technique for grouping similar data.
- CNN (Convolutional Neural Network): A neural network excellent for processing images and static objects, though it does not consider temporal context.
- Conv2D (Convolutional Layer): A convolutional layer that applies filters to the input to extract important features.
- Cropping2D (Output Cropping): Trims parts of the output to correct mismatches in dimensions.
- Data Denoising: Automatic noise removal from data via autoencoders.
- Decoder: The part of an autoencoder that reconstructs the original input from the compressed representation.
- Deep Neural Network (DNN): A neural network with three or more hidden layers, capable of processing raw data such as images and text.
- Dependent Variable (Target/Output): The variable that the model is intended to predict (e.g., churn: yes/no).
- Dimensionality Reduction: A technique to reduce the number of features in the data, improving efficiency and interpretability.
- Derivative: Measures the rate of change of a function; used to calculate the slope of the cost function.
- Distance Euclidean: A metric for calculating the distance between two points in multidimensional space.
- Distance Manhattan: A metric based on orthogonal (grid-like) paths, an alternative to Euclidean distance.
- Dropout: Regularization technique that reduces overfitting by randomly deactivating a fraction of neurons during training.
- Dummy Class: A fictitious class used in One-vs-All to separate a single class from the others.
- Elbow Method: A method for finding the optimal number of clusters in K-Means.
- Encoder: The part of an autoencoder that reduces the dimensionality of data into a more compact representation.
- Epsilon-Tube: The margin around the prediction in SVR, within which points are not penalized.
- Epoch: A complete training cycle where the model has seen all input data once.
- Euclidean Distance: Measures the straight-line distance between two points. Used in K-Means, KNN, Image Analysis.
- Feature: An independent variable used as input for the model.
- Feature Irrelevant: Useless or redundant variables that increase noise in the model and reduce accuracy.
- Feature Relevant: Input variables that help the model improve prediction accuracy.
- Feature Scaling: The process of normalizing features to improve model performance.
- Feature Selection: The process of choosing the most relevant features to improve model accuracy.
- Feature Standardization: The process of scaling features so that they are comparable, reducing their unbalanced impact on predictions.
- Features: The input (independent) variables that describe the observations.
- Flattening: Transforming the convolutional output into a vector for the dense layer.
- Forward Propagation: The process by which data passes through the neural network, from input to output.
- Fully Connected Layer: Also known as the Dense Layer; the final layer for classification using Softmax.
- Functional API: An alternative to the Sequential API that allows for creating more complex and flexible models, with multiple inputs/outputs and non-linear connections.
- Gamma: A parameter of RBF and polynomial kernels that controls how much a single data point influences the decision boundary.
- Gradient Descent: An iterative algorithm used to minimize the cost function.
- Gradients: Values that indicate how much the network's weights should be updated. Too small values make learning slow.
- Ground Truth: The actual or correct value that the model is intended to predict.
- Hard Margin: A requirement for perfect separation between classes with a rigid margin.
- Hidden Layer: An intermediate layer in the neural network that processes information.
- Hidden Layers: Multiple intermediate layers that process data between the input and output layers.
- Hierarchical Clustering: A clustering technique that creates a hierarchical structure of groups.
- Hyperbolic Tangent (Tanh): A sigmoid variant with outputs ranging from -1 to 1, providing more balanced values.
- Hyperplane: A multidimensional surface that separates data into different classes.
- Image Classification: An application of neural networks where images are categorized into different classes.
- Independent Variables (Feature/Input): The variables used for making predictions (e.g., age, income, purchasing habits).
- Inference: The process of using a trained model to make predictions on new, unseen data.
- Input Layer: The first layer of the neural network that receives the initial data.
- Iteration: A cycle in the algorithm where weights are updated to approach the optimal value.
- K-Nearest Neighbors (KNN): A supervised learning algorithm that classifies or predicts based on the nearest neighbors.
- K Classes: The total number of classes in a multi-class classification problem.
- K-Means: A clustering algorithm that divides data into k groups based on similarity.
- Kernel: A function that transforms data, making it separable in high-dimensional spaces.
- Keras: A deep learning library used to build neural networks quickly and easily.
- Labeled Data: A dataset in which each example has an assigned class for training purposes.
- Learning Rate (α): A parameter that controls the speed at which model parameters are updated.
- LSTM (Long Short-Term Memory): An advanced type of recurrent neural network (RNN) that handles long-term dependencies more effectively by avoiding the vanishing gradient problem. Applications include image generation, automated writing, and the automatic description of images and videos.
- Linear Combination (z): The weighted sum of inputs and weights, plus a bias:
[ z = (x₁·w₁) + (x₂·w₂) + b ] - Linear Kernel: Uses a simple hyperplane to separate classes.
- Linear Regression: A regression algorithm that predicts a continuous value based on a linear relationship between variables.
- Log-Loss (Loss Function): A loss function used to measure the error in logistic regression.
- Logistic Regression: A classification algorithm that predicts the probability that an observation belongs to a class.
- Logit: The logarithm of the odds ratio, used to model log-linear relationships.
- Logit Function: Transforms any value into a probability between 0 and 1.
- loss: Measures the error on the training data.
- Majority Voting: A method used in One-vs-One classification where the final class is determined by the most votes among binary classifiers.
- Margin: The distance between the hyperplane and the nearest data points (support vectors).
- Mean Squared Error (MSE): A loss function that measures error in regression models.
- Max-Pooling: A technique that selects the maximum value within a region of the image.
- Minkowski Distance: A distance metric that generalizes Euclidean and Manhattan distances. Used in Clustering, KNN, Geometry.
- Minimum Global (Global Minimum): The lowest point of the cost function, representing the smallest possible error.
- Minimum Local (Local Minimum): A low point in the cost function, which is not necessarily the absolute minimum, where the model may become stuck.
- Multicollinearity: A phenomenon where two or more features are strongly correlated, negatively affecting the model.
- Multinomial Logistic Regression: A statistical model that generalizes binary logistic regression for multi-class classification.
- Multi-Class Classification: A problem in which a data point must be assigned to one of K available classes.
- Neural Network: A computational model inspired by the human brain, composed of interconnected artificial neurons.
- Neuron: The basic unit of the brain and nervous system, responsible for transmitting information.
- Neuron Output (a): The final value of a neuron after applying the activation function.
- Non-linearity: A property that enables a model to learn complex relationships between variables.
- Nucleus: The part of the neuron that contains the cell’s genetic material and processes received information.
- Observations: The rows in a dataset, each containing information about a single example.
- Odds Ratio: The ratio between the probability of success and the probability of failure.
- One-Hot Encoding: A technique to convert categorical variables into numeric form for machine learning models such as logistic regression.
- One-vs-All (One-vs-Rest): A multi-class classification strategy where a binary classifier is built for each class, distinguishing it from all other classes.
- One-vs-One: A classification strategy in which a binary classifier is trained for each pair of classes, and the final decision is made based on the majority vote.
- Outlier Detection: The process of identifying anomalous data points in a dataset.
- Output Layer: The final layer of a neural network that produces the result.
- Overfitting: When a model is overly complex and fits the training data too closely, leading to poor performance on new data.
- Parameter: Are the values that the model learns during the training process
- Parameter C: A parameter in SVM models that controls the trade-off between a strict separation and a softer margin.
- Parameters (θ): The model coefficients that are optimized during training.
- PCA (Principal Component Analysis): A traditional algorithm for dimensionality reduction, limited to linear transformations.
- Ponderation of Neighbors: A technique in KNN classification that assigns greater weight to nearer neighbors.
- Polynomial Kernel: Maps data into a more complex space using polynomial functions.
- Pooling Layer: A layer that reduces the dimensions of data (e.g., images) to optimize the network.
- RBM (Restricted Boltzmann Machine): An advanced unsupervised model used to generate missing data, balance datasets, and extract features.
- RBF (Radial Basis Function) Kernel: A kernel that uses a transformation based on the distance between points to separate complex data.
- ReLU (Rectified Linear Unit): The most commonly used activation function, which activates only neurons with positive input.
- Regression: A statistical technique that estimates the relationship between a continuous dependent variable and one or more independent variables.
- Regression Model: A model that predicts a continuous numerical value, such as concrete strength.
- Regression with KNN: A method that predicts a numerical value by taking the mean or median of the values of the K nearest neighbors.
- Recommendation Systems: Applications that suggest content based on the clustering of users or products.
- Recurrent Neural Networks (RNNs): Deep neural networks designed to process sequential data by using previous outputs as inputs for subsequent steps.
- Santiago Ramón y Cajal: The Spanish scientist considered the father of modern neuroscience.
- Sequential Data: Data organized in a specific order where context from previous elements is crucial.
- Sequential Model-API: A type of model in Keras where layers are stacked sequentially.
- Shallow Neural Network: A neural network with only one or two hidden layers that primarily processes input as vectors.
- Sigmoid: A mathematical function that transforms inputs into a value between 0 and 1.
- Soft Margin: A margin that allows some misclassifications to improve the model's generalization.
- SoftMax Probability: The probability assigned to each class in a SoftMax model, computed by transforming the dot products of data and model parameters.
- SoftMax Regression: A variant of logistic regression that assigns probabilities to multiple classes by transforming outputs into a probability distribution.
- Softmax: Converts the output of a layer into probabilities for class membership.
- Stride: The step size with which the convolutional filter moves across the image.
- Soma: The main body of a neuron that contains its nucleus.
- Standardization of Features: The process of scaling features to make them comparable and reduce unbalanced impact on predictions.
- Stochastic Gradient Descent (SGD): A variant of gradient descent that updates weights using one sample at a time.
- Supervised Learning: A learning method in which the model is trained on labeled data.
- Support Vector Machines (SVM): A supervised machine learning algorithm used for classification and regression.
- Support Vector Regression (SVR): A variant of SVM used for predicting continuous values.
- Support Vectors: Data points that are closest to the hyperplane and influence class separation.
- Target: The dependent or output variable that the model is intended to predict.
- Temporal Context: Relevant information over time that influences the processing of sequential data.
- Tensors: Fundamental data structure in artificial intelligence, providing a means of storing both input and output data within a model
- Theta Coefficient: Values that indicate how much each feature affects the prediction.
- Threshold (Decision Threshold): The value (e.g., 0.5) beyond which an observation is assigned to a class.
- Training Set: The dataset used to train the model.
- Underfitting: When the model is too simple to capture the underlying patterns in the data, leading to inaccurate predictions.
- Unsupervised Learning: Machine learning without labels, where the model finds patterns in the data.
- UpSampling2D (Decoding for Autoencoders): The inverse operation of pooling, which increases the input size by replicating its values.
- val_loss: Measures the error on the test/validation data.
- Values of K: The number of neighbors considered when determining the class or target value in KNN.
- Vanishing Gradient Problem: A problem where gradients become too small during training, making the learning process slow and ineffective.
- Variance: A measure of how much the model’s predictions fluctuate when trained on different subsets of the dataset. High variance often leads to overfitting.
- Weight (w): A numerical value that determines the importance of an input in a neuron.
- Weight Update: The process of updating a weight using the formula:
[ w_{\text{new}} = w_{\text{old}} - \alpha \cdot \text{gradient} ]