-                 
		Deep Learning
- 
							 			- 
						 
		- Resume
- Add
- AlphaDropout
- AdditiveAttention
- Attention
- Average
- AvgPool1D
- AvgPool2D
- AvgPool3D
- BatchNormalization
- Bidirectional
- Concatenate
- Conv1D
- Conv1DTranspose
- Conv2D
- Conv2DTranspose
- Conv3D
- Conv3DTranspose
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Dense
- Cropping1D
- Cropping2D
- Cropping3D
- DepthwiseConv2D
- Dropout
- Embedding
- Flatten
- ELU
- Exponential
- GaussianDropout
- GaussianNoise
- GlobalAvgPool1D
- GlobalAvgPool2D
- GlobalAvgPool3D
- GlobalMaxPool1D
- GlobalMaxPool2D
- GlobalMaxPool3D
- GRU
- GELU
- Input
- LayerNormalization
- LSTM
- MaxPool1D
- MaxPool2D
- MaxPool3D
- MultiHeadAttention
- HardSigmoid
- LeakyReLU
- Linear
- Multiply
- Permute3D
- Reshape
- RNN
- PReLU
- ReLU
- SELU
- Output Predict
- Output Train
- SeparableConv1D
- SeparableConv2D
- SimpleRNN
- SpatialDropout
- Sigmoid
- SoftMax
- SoftPlus
- SoftSign
- Split
- UpSampling1D
- UpSampling2D
- UpSampling3D
- ZeroPadding1D
- ZeroPadding2D
- ZeroPadding3D
- Swish
- TanH
- ThresholdedReLU
- Substract
- Show All Articles (63) Collapse Articles
 
- 
						 
		 			- 
						 
		 			
- 
						 			- 
						 
		- Exp
- Identity
- Abs
- Acos
- Acosh
- ArgMax
- ArgMin
- Asin
- Asinh
- Atan
- Atanh
- AveragePool
- Bernouilli
- BitwiseNot
- BlackmanWindow
- Cast
- Ceil
- Celu
- ConcatFromSequence
- Cos
- Cosh
- DepthToSpace
- Det
- DynamicTimeWarping
- Erf
- EyeLike
- Flatten
- Floor
- GlobalAveragePool
- GlobalLpPool
- GlobalMaxPool
- HammingWindow
- HannWindow
- HardSwish
- HardMax
- lrfft
- lsNaN
- Log
- LogSoftmax
- LpNormalization
- LpPool
- LRN
- MeanVarianceNormalization
- MicrosoftGelu
- Mish
- Multinomial
- MurmurHash3
- Neg
- NhwcMaxPool
- NonZero
- Not
- OptionalGetElement
- OptionalHasElement
- QuickGelu
- RandomNormalLike
- RandomUniformLike
- RawConstantOfShape
- Reciprocal
- ReduceSumInteger
- RegexFullMatch
- Rfft
- Round
- SampleOp
- Shape
- SequenceLength
- Shrink
- Sin
- Sign
- Sinh
- Size
- SpaceToDepth
- Sqrt
- StringNormalizer
- Tan
- TfldfVectorizer
- Tokenizer
- Transpose
- UnfoldTensor
- lslnf
- ImageDecoder
- Inverse
- Show All Articles (65) Collapse Articles
 
 
- 
						 
		
- 
						 			- 
						 
		- Add
- AffineGrid
- And
- BiasAdd
- BiasGelu
- BiasSoftmax
- BiasSplitGelu
- BitShift
- BitwiseAnd
- BitwiseOr
- BitwiseXor
- CastLike
- CDist
- CenterCropPad
- Clip
- Col2lm
- ComplexMul
- ComplexMulConj
- Compress
- ConvInteger
- Conv
- ConvTranspose
- ConvTransposeWithDynamicPads
- CropAndResize
- CumSum
- DeformConv
- DequantizeBFP
- DequantizeLinear
- DequantizeWithOrder
- DFT
- Div
- DynamicQuantizeMatMul
- Equal
- Expand
- ExpandDims
- FastGelu
- FusedConv
- FusedGemm
- FusedMatMul
- FusedMatMulActivation
- GatedRelativePositionBias
- Gather
- GatherElements
- GatherND
- Gemm
- GemmFastGelu
- GemmFloat8
- Greater
- GreaterOrEqual
- GreedySearch
- GridSample
- GroupNorm
- InstanceNormalization
- Less
- LessOrEqual
- LongformerAttention
- MatMul
- MatMulBnb4
- MatMulFpQ4
- MatMulInteger
- MatMulInteger16
- MatMulIntergerToFloat
- MatMulNBits
- MaxPoolWithMask
- MaxRoiPool
- MaxUnPool
- MelWeightMatrix
- MicrosoftDequantizeLinear
- MicrosoftGatherND
- MicrosoftGridSample
- MicrosoftPad
- MicrosoftQLinearConv
- MicrosoftQuantizeLinear
- MicrosoftRange
- MicrosoftTrilu
- Mod
- MoE
- Mul
- MulInteger
- NegativeLogLikelihoodLoss
- NGramRepeatBlock
- NhwcConv
- NhwcFusedConv
- NonMaxSuppression
- OneHot
- Or
- PackedAttention
- PackedMultiHeadAttention
- Pad
- Pow
- QGemm
- QLinearAdd
- QLinearAveragePool
- QLinearConcat
- QLinearConv
- QLinearGlobalAveragePool
- QLinearLeakyRelu
- QLinearMatMul
- QLinearMul
- QLinearReduceMean
- QLinearSigmoid
- QLinearSoftmax
- QLinearWhere
- QMoE
- QOrderedAttention
- QOrderedGelu
- QOrderedLayerNormalization
- QOrderedLongformerAttention
- QOrderedMatMul
- QuantizeLinear
- QuantizeWithOrder
- Range
- ReduceL1
- ReduceL2
- ReduceLogSum
- ReduceLogSumExp
- ReduceMax
- ReduceMean
- ReduceMin
- ReduceProd
- ReduceSum
- ReduceSumSquare
- RelativePositionBias
- Reshape
- Resize
- RestorePadding
- ReverseSequence
- RoiAlign
- RotaryEmbedding
- ScatterElements
- ScatterND
- SequenceAt
- SequenceErase
- SequenceInsert
- Sinh
- Slice
- SparseToDenseMatMul
- SplitToSequence
- Squeeze
- STFT
- StringConcat
- Sub
- Tile
- TorchEmbedding
- TransposeMatMul
- Trilu
- Unsqueeze
- Where
- WordConvEmbedding
- Xor
- Show All Articles (134) Collapse Articles
 
- 
						 
		- Attention
- AttnLSTM
- BatchNormalization
- BiasDropout
- BifurcationDetector
- BitmaskBiasDropout
- BitmaskDropout
- DecoderAttention
- DecoderMaskedMultiHeadAttention
- DecoderMaskedSelfAttention
- Dropout
- DynamicQuantizeLinear
- DynamicQuantizeLSTM
- EmbedLayerNormalization
- GemmaRotaryEmbedding
- GroupQueryAttention
- GRU
- LayerNormalization
- LSTM
- MicrosoftMultiHeadAttention
- QAttention
- RemovePadding
- RNN
- Sampling
- SkipGroupNorm
- SkipLayerNormalization
- SkipSimplifiedLayerNormalization
- SoftmaxCrossEntropyLoss
- SparseAttention
- TopK
- WhisperBeamSearch
- Show All Articles (15) Collapse Articles
 
 
- 
						 
		
 
 
- 
						 
		 			
- 
						 
		 			
- 
						 			
- 
						 			
 
- 
						 
		
- 
							 
		 			
- 
							 
		 			- 
						 
		 			- 
						 
		- AdditiveAttention
- Attention
- BatchNormalization
- Bidirectional
- Conv1D
- Conv2D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Dense
- DepthwiseConv2D
- Embedding
- LayerNormalization
- GRU
- LSTM
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- MutiHeadAttention
- SeparableConv1D
- SeparableConv2D
- MultiHeadAttention
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- SimpleRNN
- 1D
- 2D
- 3D
- 4D
- 5D
- 6D
- Scalar
- Show All Articles (22) Collapse Articles
 
- 
						 
		- AdditiveAttention
- Attention
- BatchNormalization
- Conv1D
- Conv2D
- Conv1DTranspose
- Conv2DTranspose
- Bidirectional
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv3DTranspose
- DepthwiseConv2D
- Dense
- Embedding
- LayerNormalization
- GRU
- PReLU 2D
- PReLU 3D
- PReLU 4D
- MultiHeadAttention
- LSTM
- PReLU 5D
- SeparableConv1D
- SeparableConv2D
- SimpleRNN
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- 1D
- 2D
- 3D
- 4D
- 5D
- 6D
- Scalar
- Show All Articles (21) Collapse Articles
 
 
- 
						 
		
- 
						 
		- AdditiveAttention
- Attention
- BatchNormalization
- Bidirectional
- Conv1D
- Conv2D
- Conv3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Dense
- DepthwiseConv2D
- Embedding
- GRU
- LayerNormalization
- LSTM
- MultiHeadAttention
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Resume
- SeparableConv1D
- SeparableConv2D
- SimpleRNN
- Show All Articles (12) Collapse Articles
 
- 
						 
		- Accuracy
- BinaryAccuracy
- BinaryCrossentropy
- BinaryIoU
- CategoricalAccuracy
- CategoricalCrossentropy
- CategoricalHinge
- CosineSimilarity
- FalseNegatives
- FalsePositives
- Hinge
- Huber
- IoU
- KLDivergence
- LogCoshError
- Mean
- MeanAbsoluteError
- MeanAbsolutePercentageError
- MeanIoU
- MeanRelativeError
- MeanSquaredError
- MeanSquaredLogarithmicError
- MeanTensor
- OneHotIoU
- OneHotMeanIoU
- Poisson
- Precision
- PrecisionAtRecall
- Recall
- RecallAtPrecision
- RootMeanSquaredError
- SensitivityAtSpecificity
- SparseCategoricalAccuracy
- SparseCategoricalCrossentropy
- SparseTopKCategoricalAccuracy
- Specificity
- SpecificityAtSensitivity
- SquaredHinge
- Sum
- TopKCategoricalAccuracy
- TrueNegatives
- TruePositives
- Resume
- Show All Articles (27) Collapse Articles
 
 
- 
						 
		 			
 
-                 
		Accelerator
-                 
		Constant
-                 
		Generator
-                 
		Full Train Step
-                 
		Eval Step
-                 
		Train Step
BatchNormalization
Description
Carries out batch normalization as described in the paperΒ https://arxiv.org/abs/1502.03167.

Depending on the mode it is being run, There are five required inputs βXβ, βscaleβ, βBβ, βinput_meanβ and βinput_varβ. Note that βinput_meanβ and βinput_varβ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:
- Output case #1: Y, running_mean, running_var (training_mode=True)
- Output case #2: Y (training_mode=False)
When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:
running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)
Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B
where:
current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)
Notice thatΒ ReduceVarΒ refers to the population variance, and it equals toΒ sum(sqrd(x_iΒ -Β x_avg))Β /Β NΒ whereΒ NΒ is the population size (this formula does not use sample sizeΒ NΒ -Β 1).
The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.
When training_mode=False:
Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B
For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * β¦ * Dn) before a BatchNormalization Op. This operator hasΒ optionalΒ inputs/outputs. SeeΒ ONNX IRΒ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argumentβs name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
Input parameters
 specified_outputs_name :Β array, this parameter lets you manually assign custom names to the output tensors of a node.
 specified_outputs_name :Β array, this parameter lets you manually assign custom names to the output tensors of a node.
 Β Graphs in :Β cluster, ONNX model architecture.
Β Graphs in :Β cluster, ONNX model architecture.
 X (heterogeneous) – T : object, input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 β¦ Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1.
 X (heterogeneous) – T : object, input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 β¦ Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1. scale (heterogeneous) – T1 : object, scale tensor of shape Β©.
 scale (heterogeneous) – T1 : object, scale tensor of shape Β©. B (heterogeneous) – T1 : object, bias tensor of shape Β©.
 B (heterogeneous) – T1 : object, bias tensor of shape Β©. input_mean (heterogeneous) – T2 : object, running (training) or estimated (testing) mean tensor of shape Β©.
 input_mean (heterogeneous) – T2 : object, running (training) or estimated (testing) mean tensor of shape Β©. input_var (heterogeneous) – T2 : object, running (training) or estimated (testing) variance tensor of shape Β©.
 input_var (heterogeneous) – T2 : object, running (training) or estimated (testing) variance tensor of shape Β©.
 
			 Β Parameters :Β cluster,
Β Parameters :Β cluster,
 epsilon : float, the epsilon value to use to avoid division by zero.
 epsilon : float, the epsilon value to use to avoid division by zero.
Default value β1e-05β. momentum :Β float, factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 – momentum).
 momentum :Β float, factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 – momentum).
Default value β1β. Β training_mode :Β boolean, whether the layer is in training mode (can store data for backward).
Β training_mode :Β boolean, whether the layer is in training mode (can store data for backward).
Default value βFalseβ. Β lda coeff :Β float, defines the coefficient by which the loss derivative will be multiplied before being sent to the previous layer (since during the backward run we go backwards).
Β lda coeff :Β float, defines the coefficient by which the loss derivative will be multiplied before being sent to the previous layer (since during the backward run we go backwards).
Default value β1β.
 Β name (optional) :Β string,Β name of the node.
Β name (optional) :Β string,Β name of the node.
 
			Output parameters
 Β Graphs out :Β cluster, ONNX model architecture.
Β Graphs out :Β cluster, ONNX model architecture.
 Y (heterogeneous) – T : object, the output tensor of the same shape as X.
 Y (heterogeneous) – T : object, the output tensor of the same shape as X. running_mean (optional, heterogeneous) – T2 : object, the running mean after the BatchNormalization operator.
 running_mean (optional, heterogeneous) – T2 : object, the running mean after the BatchNormalization operator. running_var (optional, heterogeneous) – T2 : object, the running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.
 running_var (optional, heterogeneous) – T2 : object, the running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.
 
			Type Constraints
T in (tensor(bfloat16),Β tensor(double),Β tensor(float),Β tensor(float16)) : Constrain input and output types to float tensors.
T1 in (tensor(bfloat16),Β tensor(double),Β tensor(float),Β tensor(float16)) : Constrain scale and bias types to float tensors.
T2 in (tensor(bfloat16),Β tensor(double),Β tensor(float),Β tensor(float16)) : Constrain mean and variance types to float tensors.
