-
Notifications
You must be signed in to change notification settings - Fork 4
How does the gradient descent work?
Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks.Gradient descent is a way to minimize an objective function J(θ)J(θ) parameterized by a model's parameters θ∈Rdθ∈Rd by updating the parameters in the opposite direction of the gradient of the objective function ∇θJ(θ)∇θJ(θ) w.r.t. to the parameters. The learning rate ηη determines the size of the steps we take to reach a (local) minimum. If you are intressted to learn more about optimization of neural network look at http://cs231n.github.io/optimization-1/
In next meeting we will discuss anout three variants of gradient descent:
(I) Batch gradient descent, (II) Stochastic gradient descent (III) Mini-batch gradient descent
and the most popular gradient descent optimization algorithms
(1) Momentum, (2) Nesterov accelerated gradient, (3) Adagrad, (4) Adadelta, (5) RMSprop, (6) Adam, (7) Visualization of algorithms