Preallocated CUDA

Description

Execute the model using CUDA mono input and mono output pointer. Type : polymorphic.
Warning : This function can only be executed with CUDA or TensorRT.
Both input and output buffer must be pre-allocated on the GPU and managed by the user (no automatic allocation or release).

 

 

Input parameters

 

Β Inference inΒ :Β object,Β inference session.

Β Inputs Info :Β cluster

Β inputs_ptr :Β integer,Β represents a pre-allocated device memory address (for example, a CUDA device pointer) where the input tensor data is already stored.
Β 
inputs_shapes :Β array,Β 
specifies the shape of the output tensor. Since the data is written into a pre-allocated device buffer, this shape allows the runtime to interpret the memory layout correctly.
Β 
inputs_ranks :Β integer,Β 
indicates the rank of the tensor, i.e. the number of dimensions (Scalar = 0, 1D = 1, 2D = 2, etc.).
Β 
inputs_types :Β enum,Β 
defines the ONNX tensor type as an enumerated value (e.g. FLOAT, INT64, STRING).

​

Β Outputs Info : cluster

Β outputs_ptr : integer, contains the raw byte representation of the input tensor data, stored as a 1D flattened buffer.
Β 
outputs_shapes :Β array,Β 
specifies the shape of the output tensor. Since the data is written into a pre-allocated device buffer, this shape allows the runtime to interpret the memory layout correctly.
Β 
outputs_ranks : integer,
indicates the rank of the tensor, i.e. the number of dimensions (Scalar = 0, 1D = 1, 2D = 2, etc.).
Β 
outputs_types : enum,
defines the ONNX tensor type as an enumerated value (e.g. FLOAT, INT64, STRING).

​

Output parameters

 

Β Inference outΒ :Β object,Β inference session.

Example

All these exemples are snippets PNG, you can drop these Snippet onto the block diagram and get the depicted code added to your VI (Do not forget to install Accelerator library to run it).

Table of Contents