UpdatedNovember 14, 2022

algorithm

algorithm : enum, (name of optimizer) for optimizer instance.
Default value “adam”.

Adadelta

Adadelta scales the learning rate based on the historical gradient while only taking into account the recent time window and not the entire history, like AdaGrad. Also uses a component that serves as an acceleration term, which accumulates historical updates (similar to momentum).

are accumulated gradients.
are accumulated updates.
are a decay constant.
are gradients.
are learning rate.
are numerical stability (e-7).
are rescaled gradients.
are weight.

$v_{t}$

Adagrad

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.

are momentum.
are gradients of the parameters we want to update.
are learning rate.
are smoothing term (avoids division by zero).
are weight.

$v_{t}$

Adam

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.

and $v_{t}$ and are bias-corrected first and second moment estimates.
and are momentum coefficient.
are gradients of the parameters we want to update.
are learning rate.
are smoothing term (avoids division by zero).
are weight.

$v_{t}$

Adamax

AdaMax algorithm is an extension to the Adaptive Movement Estimation (Adam) Optimization algorithm. More broadly, is an extension to the Gradient Descent Optimization algorithm. Adam can be understood as updating weights inversely proportional to the scaled L2 norm (squared) of past gradients. AdaMax extends this to the so-called infinite norm (max) of past gradients.

and $v_{t}$
and are momentum coefficient.
are gradients of the parameters we want to update.
are learning rate.
are updated learning rate.
are smoothing term (avoids division by zero).
are weight.

$v_{t}$

Inertia

are momentum
are momentum coefficient.
are gradients of the parameters we want to update.
are learning rate.
are weight.

$v_{t}$

NAdam

Much like Adam is essentially RMSprop with momentum, Nadam is Adam with Nesterov momentum.

$v_{t}$

Nesterov

Nesterov momentum is an extension of momentum that involves calculating the decaying moving average of the gradients of projected positions in the search space rather than the actual positions themselves.

are momentum
are momentum coefficient.
are gradients of the parameters we want to update.
are learning rate.
are weight.

$v_{t}$

RMSprop

The gist of RMSprop is to:

Maintain a moving (discounted) average of the square of gradients
Divide the gradient by the root of this average

This implementation of RMSprop uses plain momentum, not Nesterov momentum.
The centered version additionally maintains a moving average of the gradients, and uses that average to estimate the variance.

$v_{t}$
are momentum coefficient.
are gradients of the parameters we want to update.
are learning rate.
are smoothing term (avoids division by zero).
are weight.

$v_{t}$

SGD

are gradients of the parameters we want to update.
are learning rate.
are weight.

$v_{t}$

This parameter is used in add_to_graph an define VIs of the AdditiveAttention, Attention, BatchNormalization, Conv1D, Conv1DTranspose, Conv2D, Conv2DTranspose, Conv3D, Conv3DTranspose, Dense, DepthwiseConv2D, Embedding, GRU, LayerNormalization, LSTM, MultiHeadAttention, SeparableConv1D, SeparableConv2D, SimpleRNN, PReLU, ConvLSTM1DCell, ConvLSTM2DCell, ConvLSTM3DCell, GRUCell, LSTMCell, SimpleRNNCell layers.

Tags:

SOTA

Installation guide

General

Accelerator Toolkit

Installation guide

Execution providers

General

Execution

CUDA Advanced

Construct Ptr Input Data

Index

Name

Exec

Exec

Mono

Multi

Input

Deep Learning Toolkit

Installation guide

Execution providers

General

Architecture

Layers

Nodes

Nodes

Activation

Mono Input

Parameters

Graph Function

Graph

File

Get & Set

Runtime

Create

Academic Training

Inference

Training

Exec

Academic

Input

Advanced

Add Weight

Index

Name

Convert

From ONNX

To ONNX

Format Weight

Get Weight

More

Layers parameters

Nodes Parameters

Computer Vision Toolkit

Installation guide

General

Tools

Image Manipulation

Image Modification

Files

Pixel Editing

Draw

Grayscale

Color

Region of Interest

Additional Windows

Session

Functions

Filters

Form

Inspection

Operators

Pattern

Treatment

Video Writter

CUDA Toolkit

Installation guide

General

Array

CuBLAS

Elementary