RKNN Cpp: Accelerating AI on Edge Devices

With the rapid adoption of artificial intelligence (AI) in embedded and IoT devices, the need for efficient deployment frameworks has grown. One such solution is RKNN (Rockchip Neural Network) Toolkit, developed by Rockchip to simplify AI model deployment on edge devices powered by Rockchip’s processors. While Python-based tools are common, RKNN C++ SDK offers greater control, speed, and efficiency, especially for developers building AI applications in C++ environments. This article explores RKNN Cpp: what it is, how it works, and how it can be integrated to optimize AI solutions on Rockchip devices.

What is RKNN C++?

RKNN is an inference framework designed to accelerate deep learning models on Rockchip’s chipsets, such as the RK3399, RK3568, and RK3588. While the RKNN Toolkit supports Python as a convenient option for prototyping and quick deployment, RKNN C++ is crucial for developers who need lower-level control for embedded systems or production-grade applications.

By utilizing the RKNN C++ SDK, developers can convert, deploy, and optimize neural networks for inference on CPUs, GPUs, and NPUs (Neural Processing Units) integrated into Rockchip SoCs (System-on-Chips). The C++ SDK allows for fast execution and offers flexibility in managing memory, hardware resources, and performance tuning—critical for applications such as computer vision, speech recognition, and robotics.

Key Features of RKNN C++ SDK

Multi-Hardware Inference Support
RKNN C++ allows developers to offload AI workloads to multiple hardware resources, including:
- CPU: Useful for simple inference or fallback processing.
- GPU: Leverages parallel computation for real-time inference.
- NPU: Specializes in accelerating deep learning models at ultra-low power consumption.
Model Conversion and Optimization
The RKNN Toolkit provides tools to convert popular AI models (from TensorFlow, PyTorch, ONNX, etc.) into the RKNN format. With the C++ SDK, these models can be optimized further for faster inference on specific hardware configurations.
On-Device Inference
The C++ SDK is optimized for real-time inference on resource-constrained devices, making it ideal for edge AI solutions that need low latency without relying on cloud connectivity.
Memory Management and Fine-Tuning
With C++, developers can manage memory allocations manually to avoid bottlenecks and optimize model execution. This is crucial when deploying models on embedded systems with limited RAM.
Cross-Platform Support
RKNN C++ SDK supports multiple Rockchip devices, making it easier to develop once and deploy across various platforms, such as mobile robots, surveillance cameras, or smart home appliances.

How to Set Up and Use RKNN C++ SDK

To effectively use the RKNN C++ SDK, developers need to install the right tools and follow a step-by-step workflow, including model conversion, compiling the SDK, and running inference on the target device.

1. Prerequisites

Before diving into development, ensure the following are installed:

Rockchip SDK and Drivers
RKNN Toolkit (for model conversion, available from Rockchip’s official site)
GCC / CMake (for compiling C++ projects)
Python (for initial model conversion and testing with RKNN Toolkit)

Also, ensure your target device (like an RK3399-based board) has all necessary libraries and drivers pre-installed.

2. Model Conversion Using RKNN Toolkit

AI models from frameworks such as TensorFlow, ONNX, or PyTorch must first be converted into the RKNN format to run on Rockchip devices. Below is an example of converting a TensorFlow model:

bash

# Install RKNN Toolkit and convert a TensorFlow model to RKNN format

pip install rknn-toolkit

python model_conversion.py

Example: TensorFlow to RKNN Conversion Script (Python):

python

from rknn.api import RKNN

# Initialize RKNN object
rknn = RKNN()

# Load TensorFlow model
rknn.load_tensorflow(model=‘model.pb’,
inputs=[‘input_node’],
outputs=[‘output_node’],
input_size_list=[[224, 224, 3]])

# Perform model conversion and save as RKNN
rknn.build(do_quantization=True) # Optional quantization for optimization
rknn.export_rknn(‘model.rknn’)

3. Writing C++ Code for Inference

Once the model is converted, the RKNN C++ SDK allows you to integrate it into your application. Below is an example of a minimal C++ code snippet for loading and running inference on an RKNN model.

Example: RKNN C++ Inference Code

cpp

#include <iostream>

#include "rknn_api.h"

int main() {
rknn_context ctx;
const char* model_path = “model.rknn”;

// Load RKNN model
int ret = rknn_init(&ctx, model_path, 0);
if (ret != RKNN_SUCC) {
std::cerr << “Failed to initialize RKNN context!” << std::endl;
return -1;
}

// Prepare input and output buffers
rknn_input input;
memset(&input, 0, sizeof(input));
input.index = 0;
input.buf = malloc(224 * 224 * 3); // Example for an RGB image
input.size = 224 * 224 * 3;
input.pass_through = 0;
input.type = RKNN_TENSOR_UINT8;

rknn_output output;
memset(&output, 0, sizeof(output));
output.want_float = 1; // Output in float format

// Run inference
ret = rknn_inputs_set(ctx, 1, &input);
ret = rknn_run(ctx, nullptr);
ret = rknn_outputs_get(ctx, 1, &output, nullptr);

// Process results
std::cout << “Inference completed. Output value: “
<< static_cast<float*>(output.buf)[0] << std::endl;

// Clean up
rknn_outputs_release(ctx, 1, &output);
rknn_destroy(ctx);
free(input.buf);

return 0;
}

In this example:

rknn_init() initializes the RKNN context with the pre-converted model.
rknn_inputs_set() and rknn_outputs_get() manage input and output buffers.
rknn_run() executes the inference on the specified hardware.

Compile the code using GCC or CMake:

bash

g++ -o rknn_inference rknn_example.cpp -lrknn_api

4. Optimizing Performance with C++

With the C++ SDK, developers can fine-tune performance by optimizing memory usage and balancing workloads across hardware components. Here are some tips:

Quantization: Use 8-bit quantization to reduce model size and improve speed.
Batch Processing: Run multiple inferences in batches to increase throughput.
NPU Offloading: Use the NPU where possible for efficient deep learning inference.

Use Cases of RKNN C++ SDK

Smart Surveillance: Cameras with Rockchip processors can run real-time object detection models using the RKNN C++ SDK for fast inference with low latency.
Autonomous Robots: Robots can use the SDK to run vision-based models locally, eliminating the need for cloud processing.
Smart Home Devices: Devices like smart speakers and home assistants can leverage C++ SDK for voice recognition models.

Conclusion

The RKNN C++ SDK offers a robust framework for deploying AI models on Rockchip-powered devices, providing speed, control, and flexibility to developers working in C++ environments. From smart cameras to autonomous robots, the SDK empowers edge devices to perform real-time inference with minimal latency. With the ability to offload AI tasks to CPUs, GPUs, or NPUs, RKNN C++ ensures efficient resource utilization, making it a preferred solution for embedded AI applications.

Select a plan

Monthly plan

Yearly plan

All plans include

Search for an article