-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sgd docs #5208
Add sgd docs #5208
Conversation
@SaviDahegaonkar i am facing issue related to workflow awaiting approval how can i solve it |
hi @SaviDahegaonkar Could you please review this PR when you get a chance? Let me know if any adjustments are needed. |
hi @SaviDahegaonkar Could you please review this PR when you get a chance? Let me know if any adjustments are needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pratik305 ,
I had a look at your file and suggested some changes to you please make them asap. Also include an example section and an syntax section as per the issue related to this says and also include your file in the correct path.
@@ -0,0 +1,61 @@ | |||
--- | |||
Title: 'Stochastic Gradient Desent' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title: 'Stochastic Gradient Desent' | |
Title: 'Stochastic Gradient Descent' |
@@ -0,0 +1,61 @@ | |||
--- | |||
Title: 'Stochastic Gradient Desent' | |||
Description: 'Stochastic Gradient Desent is optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: 'Stochastic Gradient Desent is optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.' | |
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.' |
Subjects: | ||
- 'Machine Learning' | ||
- 'Deep Learning' | ||
- 'Computer Science' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subjects: | |
- 'Machine Learning' | |
- 'Deep Learning' | |
- 'Computer Science' | |
Subjects: | |
- 'Machine Learning' | |
- 'Deep Learning' | |
- 'Computer Science' |
Include only those subjects that are part of subjects.md file and you can add one if it is not in the list. Here Deep Learning
is not part of subjects.md file so you can add this if you want it to include in your PR.
- 'AI' | ||
- 'Neural Network' | ||
- 'Optimizer' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 'AI' | |
- 'Neural Network' | |
- 'Optimizer' | |
- 'AI' | |
- 'Neural Network' | |
- 'Optimizer' |
Include only those tags that are part of tags.md list. Here optimizer
is not part of the tags list so you can include it in the tags.md list so you can use this in your PR.
- 'paths/data-science' | ||
--- | ||
|
||
**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks. | |
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks. |
|
||
**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks. | ||
|
||
The term stochastic mean randomness on which algorithm based upon. In this algorithm instead of taking whole dataset like grdient descent we take single randomly selected data point or small batch of data.suppose if the data set contains 500 rows SGD update the model parameters 500 times in one cycle or one epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term stochastic mean randomness on which algorithm based upon. In this algorithm instead of taking whole dataset like grdient descent we take single randomly selected data point or small batch of data.suppose if the data set contains 500 rows SGD update the model parameters 500 times in one cycle or one epoch. | |
The term `stochastic` means randomness on which the algorithm is based. In this algorithm, instead of taking whole datasets like `gradient descent`, we take single randomly selected data points or small batches of data. Suppose if the data set contains 500 rows SGD updates the model parameters 500 times in one cycle or one epoch. |
$$ | ||
\large \theta = \theta - \alpha * \nabla J((\theta ; x_iy_i)) | ||
$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$$ | |
\large \theta = \theta - \alpha * \nabla J((\theta ; x_iy_i)) | |
$$ | |
$$ | |
\large \theta = \theta - \alpha \cdot \nabla J(\theta ; x_i, y_i) | |
$$ |
Typically \cdot is used for multiplication to keep the LaTex notation standard and the *
symbol is not used.
## Advantages | ||
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets. | ||
- **Reduced Computation Time:** SDD takes only subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster. | ||
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems. | ||
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Advantages | |
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets. | |
- **Reduced Computation Time:** SDD takes only subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster. | |
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems. | |
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in. | |
## Advantages | |
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets. | |
- **Reduced Computation Time:** SGD takes only a subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster. | |
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems. | |
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in. |
## Practical Tips And Tricks When Using SGD | ||
- Shuffle data before training | ||
- Use mini batches(batch size 32) | ||
- Normalize input | ||
- Choose suitable learning rate (0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Practical Tips And Tricks When Using SGD | |
- Shuffle data before training | |
- Use mini batches(batch size 32) | |
- Normalize input | |
- Choose suitable learning rate (0.01) | |
## Practical Tips And Tricks When Using SGD | |
- Shuffle data before training | |
- Use mini batches(batch size 32) | |
- Normalize input | |
- Choose a suitable learning rate (0.01) |
@SaviDahegaonkar all changes are done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pratik305 ,
I have suggested a few grammatical corrections and also included a relevant syntax and example section. Please make the changes asap.
Thanks,
Savi
- 'Computer Science' | ||
Tags: | ||
- 'AI' | ||
- 'Neural Network' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 'Neural Network' | |
- 'Neural Networks' |
@@ -0,0 +1,106 @@ | |||
--- | |||
Title: 'Stochastic Gradient Descent' | |||
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.' | |
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss function in machine learning and deep learning models.' |
- 'paths/data-science' | ||
--- | ||
|
||
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks. | |
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is a variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks. |
- At each iteration, a random sample is selected from the training dataset. | ||
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample. | ||
- The model parameters are updated using the computed gradient and the learning rate. | ||
- The process is repeated for multiple iterations until convergence or a specified number of epochs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- At each iteration, a random sample is selected from the training dataset. | |
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample. | |
- The model parameters are updated using the computed gradient and the learning rate. | |
- The process is repeated for multiple iterations until convergence or a specified number of epochs. | |
- At each iteration, a random sample is selected from the training dataset. | |
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample. | |
- The model parameters are updated using the computed gradient and the learning rate. | |
- The process is repeated for multiple iterations until convergence or a specified number of epochs. |
## Syntax | ||
- Learning Rate (α): A hyperparameter that controls the size of the update step. | ||
- Number of Iterations: The number of times the algorithm will iterate over the dataset. | ||
- Loss Function: The function that measures the error of the model predictions. | ||
- Gradient Calculation: The method for computing gradients based on the loss function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Syntax | |
- Learning Rate (α): A hyperparameter that controls the size of the update step. | |
- Number of Iterations: The number of times the algorithm will iterate over the dataset. | |
- Loss Function: The function that measures the error of the model predictions. | |
- Gradient Calculation: The method for computing gradients based on the loss function. | |
## Syntax | |
- Learning Rate (α): A hyperparameter that controls the size of the update step. | |
- Number of Iterations: The number of times the algorithm will iterate over the dataset. | |
- Loss Function: The function that measures the error of the model predictions. | |
- Gradient Calculation: The method for computing gradients based on the loss function. |
The syntax must be included inside the backticks (.....
) in the pseudo block. I can't find any syntax here. Please include a syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaviDahegaonkar there is no syntax for sgd. we can use sgd with tensorflow, pytorch, numpy,scikit-learn each one of them are different syntax. what to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can i add syntax like this
SGD(learning_rate, n_iterations, loss_function, gradient_calculation)
and then add code like
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations):
for iteration in range(n_iterations):
for i in range(len(y)):
gradient = compute_gradient(X[i], y[i], theta)
theta -= learning_rate * gradient
return theta
@SaviDahegaonkar sir reply fast that can i make changes |
Yes sure you can, I will approve this PR. |
@SaviDahegaonkar all done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pratik305 ,
LGTM!
Thanks,
Savi
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations): | ||
for iteration in range(n_iterations): | ||
for i in range(len(y)): | ||
gradient = compute_gradient(X[i], y[i], theta) | ||
theta -= learning_rate * gradient | ||
return theta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations): | |
for iteration in range(n_iterations): | |
for i in range(len(y)): | |
gradient = compute_gradient(X[i], y[i], theta) | |
theta -= learning_rate * gradient | |
return theta | |
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations): | |
for iteration in range(n_iterations): | |
for i in range(len(y)): | |
gradient = compute_gradient(X[i], y[i], theta) | |
theta -= learning_rate * gradient | |
return theta |
Added some proper indentation.
@avdhoottt all changes are done can you review this please |
@avdhoottt when will it get merged |
@avdhoottt When will this is get this merge. Tell is there any think to change |
@Maheshwaran17 can you please check this |
@avdhoottt when will it get merge sir is there any problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hey @pratik305, sorry for the late reply. I'm merging this PR. Thank you so much for contributing! |
👋 @pratik305 🎉 Your contribution(s) can be seen here: https://www.codecademy.com/resources/docs/ai/neural-networks/stochastic-gradient-descent Please note it may take a little while for changes to become visible. |
@avdhoottt thank you sir |
Description
This pull request introduces a new documentation file on Stochastic Gradient Descent (SGD) in the
stochastic-gradient-descent.md
file.SGD is a widely used optimization algorithm in machine learning and deep learning due to its efficiency and scalability, particularly for large datasets. However, its effectiveness and stability can be significantly influenced by how it is implemented and tuned. This new documentation aims to offer clear, comprehensive insights into SGD’s operation, benefits, and limitations, helping users to better understand and apply this algorithm in their projects.
Issue Solved
Closes #4527
Type of Change
Checklist
main
branch.Issues Solved
section.