Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sgd docs #5208

Merged
merged 21 commits into from
Nov 14, 2024
Merged

Add sgd docs #5208

merged 21 commits into from
Nov 14, 2024

Conversation

pratik305
Copy link
Contributor

Description

This pull request introduces a new documentation file on Stochastic Gradient Descent (SGD) in the stochastic-gradient-descent.md file.
SGD is a widely used optimization algorithm in machine learning and deep learning due to its efficiency and scalability, particularly for large datasets. However, its effectiveness and stability can be significantly influenced by how it is implemented and tuned. This new documentation aims to offer clear, comprehensive insights into SGD’s operation, benefits, and limitations, helping users to better understand and apply this algorithm in their projects.

Issue Solved

Closes #4527

Type of Change

  • Adding a new entry
  • Editing an existing entry (fixing a typo, bug, issues, etc)
  • Updating the documentation

Checklist

  • All writings are my own.
  • My entry follows the Codecademy Docs style guide.
  • My changes generate no new warnings.
  • I have performed a self-review of my own writing and code.
  • I have checked my entry and corrected any misspellings.
  • I have made corresponding changes to the documentation if needed.
  • I have confirmed my changes are not being pushed from my forked main branch.
  • I have confirmed that I'm pushing from a new branch named after the changes I'm making.
  • I have linked any issues that are relevant to this PR in the Issues Solved section.

@CLAassistant
Copy link

CLAassistant commented Sep 11, 2024

CLA assistant check
All committers have signed the CLA.

@SaviDahegaonkar SaviDahegaonkar self-assigned this Sep 11, 2024
@SaviDahegaonkar SaviDahegaonkar added new entry New entry or entries status: under review Issue or PR is currently being reviewed neural-networks Neural Networks labels Sep 11, 2024
@pratik305
Copy link
Contributor Author

@SaviDahegaonkar i am facing issue related to workflow awaiting approval how can i solve it

@pratik305
Copy link
Contributor Author

hi @SaviDahegaonkar Could you please review this PR when you get a chance? Let me know if any adjustments are needed.
Thanks

@pratik305
Copy link
Contributor Author

hi @SaviDahegaonkar Could you please review this PR when you get a chance? Let me know if any adjustments are needed.
Thanks

Copy link
Collaborator

@SaviDahegaonkar SaviDahegaonkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pratik305 ,
I had a look at your file and suggested some changes to you please make them asap. Also include an example section and an syntax section as per the issue related to this says and also include your file in the correct path.

@@ -0,0 +1,61 @@
---
Title: 'Stochastic Gradient Desent'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Title: 'Stochastic Gradient Desent'
Title: 'Stochastic Gradient Descent'

@@ -0,0 +1,61 @@
---
Title: 'Stochastic Gradient Desent'
Description: 'Stochastic Gradient Desent is optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Description: 'Stochastic Gradient Desent is optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.'
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.'

Comment on lines 4 to 7
Subjects:
- 'Machine Learning'
- 'Deep Learning'
- 'Computer Science'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Subjects:
- 'Machine Learning'
- 'Deep Learning'
- 'Computer Science'
Subjects:
- 'Machine Learning'
- 'Deep Learning'
- 'Computer Science'

Include only those subjects that are part of subjects.md file and you can add one if it is not in the list. Here Deep Learning is not part of subjects.md file so you can add this if you want it to include in your PR.

Comment on lines 9 to 11
- 'AI'
- 'Neural Network'
- 'Optimizer'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 'AI'
- 'Neural Network'
- 'Optimizer'
- 'AI'
- 'Neural Network'
- 'Optimizer'

Include only those tags that are part of tags.md list. Here optimizer is not part of the tags list so you can include it in the tags.md list so you can use this in your PR.

- 'paths/data-science'
---

**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks.
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks.


**Stochastic Gradient Descent** (SGD) is a optimization algorithm. It is variant of gradient descent optimizer. The SGD minimize the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weight and bias in Artificial Neural Networks.

The term stochastic mean randomness on which algorithm based upon. In this algorithm instead of taking whole dataset like grdient descent we take single randomly selected data point or small batch of data.suppose if the data set contains 500 rows SGD update the model parameters 500 times in one cycle or one epoch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The term stochastic mean randomness on which algorithm based upon. In this algorithm instead of taking whole dataset like grdient descent we take single randomly selected data point or small batch of data.suppose if the data set contains 500 rows SGD update the model parameters 500 times in one cycle or one epoch.
The term `stochastic` means randomness on which the algorithm is based. In this algorithm, instead of taking whole datasets like `gradient descent`, we take single randomly selected data points or small batches of data. Suppose if the data set contains 500 rows SGD updates the model parameters 500 times in one cycle or one epoch.

Comment on lines 35 to 37
$$
\large \theta = \theta - \alpha * \nabla J((\theta ; x_iy_i))
$$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$$
\large \theta = \theta - \alpha * \nabla J((\theta ; x_iy_i))
$$
$$
\large \theta = \theta - \alpha \cdot \nabla J(\theta ; x_i, y_i)
$$

Typically \cdot is used for multiplication to keep the LaTex notation standard and the * symbol is not used.

Comment on lines 45 to 49
## Advantages
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets.
- **Reduced Computation Time:** SDD takes only subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster.
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems.
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Advantages
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets.
- **Reduced Computation Time:** SDD takes only subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster.
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems.
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in.
## Advantages
- **Faster convergence:** SGD updates parameters more frequently hence it takes less time to converge especially for large datasets.
- **Reduced Computation Time:** SGD takes only a subset of dataset or batch for each update. This makes it easy to handle large datasets and compute faster.
- **Avoid Local Minima:** The noise introduced by updating parameters with individual data points or small batches can help escape local minima.This can potentially lead to better solutions in complex, non-convex optimization problems.
- **Online Learning:** SGD can be used in scenarios where data is arriving sequentially (online learning).- It allows models to be updated continuously as new data comes in.

Comment on lines 56 to 60
## Practical Tips And Tricks When Using SGD
- Shuffle data before training
- Use mini batches(batch size 32)
- Normalize input
- Choose suitable learning rate (0.01)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Practical Tips And Tricks When Using SGD
- Shuffle data before training
- Use mini batches(batch size 32)
- Normalize input
- Choose suitable learning rate (0.01)
## Practical Tips And Tricks When Using SGD
- Shuffle data before training
- Use mini batches(batch size 32)
- Normalize input
- Choose a suitable learning rate (0.01)

@pratik305
Copy link
Contributor Author

@SaviDahegaonkar all changes are done

Copy link
Collaborator

@SaviDahegaonkar SaviDahegaonkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pratik305 ,
I have suggested a few grammatical corrections and also included a relevant syntax and example section. Please make the changes asap.

Thanks,
Savi

- 'Computer Science'
Tags:
- 'AI'
- 'Neural Network'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 'Neural Network'
- 'Neural Networks'

@@ -0,0 +1,106 @@
---
Title: 'Stochastic Gradient Descent'
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss functions in machine learning and deep learning models.'
Description: 'Stochastic Gradient Descent is an optimizer algorithm that minimizes the loss function in machine learning and deep learning models.'

- 'paths/data-science'
---

**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks.
**Stochastic Gradient Descent** (SGD) is an optimization algorithm. It is a variant of gradient descent optimizer. The SGD minimizes the loss function of machine learning algorithms and deep learning algorithms during backpropagation to update the weights and biases in Artificial Neural Networks.

Comment on lines 26 to 29
- At each iteration, a random sample is selected from the training dataset.
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample.
- The model parameters are updated using the computed gradient and the learning rate.
- The process is repeated for multiple iterations until convergence or a specified number of epochs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- At each iteration, a random sample is selected from the training dataset.
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample.
- The model parameters are updated using the computed gradient and the learning rate.
- The process is repeated for multiple iterations until convergence or a specified number of epochs.
- At each iteration, a random sample is selected from the training dataset.
- The gradient of the cost function with respect to the model parameters is computed based on the selected sample.
- The model parameters are updated using the computed gradient and the learning rate.
- The process is repeated for multiple iterations until convergence or a specified number of epochs.

Comment on lines 60 to 64
## Syntax
- Learning Rate (α): A hyperparameter that controls the size of the update step.
- Number of Iterations: The number of times the algorithm will iterate over the dataset.
- Loss Function: The function that measures the error of the model predictions.
- Gradient Calculation: The method for computing gradients based on the loss function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Syntax
- Learning Rate (α): A hyperparameter that controls the size of the update step.
- Number of Iterations: The number of times the algorithm will iterate over the dataset.
- Loss Function: The function that measures the error of the model predictions.
- Gradient Calculation: The method for computing gradients based on the loss function.
## Syntax
- Learning Rate (α): A hyperparameter that controls the size of the update step.
- Number of Iterations: The number of times the algorithm will iterate over the dataset.
- Loss Function: The function that measures the error of the model predictions.
- Gradient Calculation: The method for computing gradients based on the loss function.

The syntax must be included inside the backticks (.....) in the pseudo block. I can't find any syntax here. Please include a syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaviDahegaonkar there is no syntax for sgd. we can use sgd with tensorflow, pytorch, numpy,scikit-learn each one of them are different syntax. what to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can i add syntax like this
SGD(learning_rate, n_iterations, loss_function, gradient_calculation)

and then add code like
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations):
for iteration in range(n_iterations):
for i in range(len(y)):
gradient = compute_gradient(X[i], y[i], theta)
theta -= learning_rate * gradient
return theta

@pratik305
Copy link
Contributor Author

@SaviDahegaonkar sir reply fast that can i make changes

@SaviDahegaonkar
Copy link
Collaborator

@SaviDahegaonkar sir reply fast that can i make changes

Yes sure you can, I will approve this PR.

@pratik305
Copy link
Contributor Author

@SaviDahegaonkar all done

Copy link
Collaborator

@SaviDahegaonkar SaviDahegaonkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pratik305 ,
LGTM!

Thanks,
Savi

Comment on lines 71 to 76
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations):
for iteration in range(n_iterations):
for i in range(len(y)):
gradient = compute_gradient(X[i], y[i], theta)
theta -= learning_rate * gradient
return theta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations):
for iteration in range(n_iterations):
for i in range(len(y)):
gradient = compute_gradient(X[i], y[i], theta)
theta -= learning_rate * gradient
return theta
def stochastic_gradient_descent(X, y, theta, learning_rate, n_iterations):
for iteration in range(n_iterations):
for i in range(len(y)):
gradient = compute_gradient(X[i], y[i], theta)
theta -= learning_rate * gradient
return theta

Added some proper indentation.

@avdhoottt avdhoottt added status: under review Issue or PR is currently being reviewed and removed status: ready for next review labels Oct 2, 2024
@avdhoottt avdhoottt self-assigned this Oct 2, 2024
@pratik305
Copy link
Contributor Author

@avdhoottt all changes are done can you review this please

@pratik305
Copy link
Contributor Author

@avdhoottt when will it get merged

@pratik305
Copy link
Contributor Author

@avdhoottt When will this is get this merge. Tell is there any think to change

@pratik305
Copy link
Contributor Author

@Maheshwaran17 can you please check this

@pratik305
Copy link
Contributor Author

@avdhoottt when will it get merge sir is there any problem

Copy link
Collaborator

@avdhoottt avdhoottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@avdhoottt avdhoottt added status: review 2️⃣ completed and removed status: under review Issue or PR is currently being reviewed labels Nov 14, 2024
@avdhoottt
Copy link
Collaborator

@avdhoottt when will it get merge sir is there any problem

Hey @pratik305, sorry for the late reply. I'm merging this PR. Thank you so much for contributing!

@avdhoottt avdhoottt merged commit c177d5f into Codecademy:main Nov 14, 2024
6 checks passed
Copy link

👋 @pratik305
You have contributed to Codecademy Docs, and we would like to know more about you and your experience.
Please take a minute to fill out this four question survey to help us better understand Docs contributions and how we can improve the experience for you and our learners.
Thank you for your help!

🎉 Your contribution(s) can be seen here:

https://www.codecademy.com/resources/docs/ai/neural-networks/stochastic-gradient-descent

Please note it may take a little while for changes to become visible.
If you're appearing as anonymous and want to be credited, see here.

@pratik305 pratik305 deleted the add-sgd-docs branch November 15, 2024 03:51
@pratik305
Copy link
Contributor Author

@avdhoottt thank you sir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Term Entry] Neural Networks ai stochastic gradient descent
4 participants