Tools¶
The tools/ directory contains scripts for exporting RF-DETR models from PyTorch checkpoints to ONNX format.
export_roboflow.py¶
Export a Roboflow fine-tuned RF-DETR checkpoint (.pth) to ONNX.
Requirements¶
Install the export extra:
Usage¶
uv run python tools/export_roboflow.py \
--weights path/to/checkpoint.pth \
--model-type nano \
--output-dir models/ \
--opset 17
Parameters¶
| Argument | Type | Default | Description |
|---|---|---|---|
--weights |
str |
Required | Path to the .pth checkpoint file |
--model-type |
str |
nano |
Architecture: nano, small, base, medium, large |
--output-dir |
str |
models/ |
Directory to save the exported ONNX model |
--opset |
int |
17 |
ONNX opset version |
--no-simplify |
flag | off | Disable onnxsim simplification step |
Output¶
The script produces a simplified ONNX model under a folder named after the checkpoint file:
For example, if --weights rf-detr-nano.pth and --output-dir models/, the output is:
Model Simplification
By default, onnxsim is applied to the exported graph. This folds constants and removes redundant nodes, resulting in faster and more portable models.
export.py¶
Low-level ONNX export utility. Auto-detects the model architecture from checkpoint weights and exports to ONNX. Supports all model variants including segmentation and XLarge models.
Usage¶
Parameters¶
| Argument | Default | Description |
|---|---|---|
--checkpoint |
Required | Path to the .pth or .pt checkpoint file |
--model-name |
(derived from checkpoint name) | Output ONNX filename |
--no-simplify |
off | Disable onnxsim simplification |
Note
export.py auto-detects the model class by counting backbone parameters. Use export_roboflow.py instead when you want to explicitly specify the architecture type.
download_onnx.sh¶
Downloads the ONNX Runtime C++ library (not models) for Linux x64. Useful when you need to manually install the ONNX Runtime C++ library for the C++ build.
# Download CPU version (default)
bash tools/download_onnx.sh
# Download GPU version
bash tools/download_onnx.sh -d gpu
# Specify a version and output directory
bash tools/download_onnx.sh -v 1.21.0 -d gpu -o libs/onnx
Options¶
| Option | Default | Description |
|---|---|---|
-v <version> |
1.21.0 |
ONNX Runtime version to download |
-d <device> |
cpu |
Device type: cpu or gpu |
-o <dir> |
libs/onnx |
Output directory |
After downloading, pass the extracted directory to CMake:
Pre-converted ONNX models
To download pre-converted RF-DETR ONNX models (not the runtime), run the benchmark or test scripts — they auto-download models from the GitHub release. Or download manually from Hugging Face.
export_fp16.py¶
Convert an existing ONNX model to FP16 (float16) or mixed-precision. This is highly recommended for GPU inference to reduce model size by 50% and improve throughput.
Requirements¶
Install the export extra:
Usage¶
Full FP16 Conversion:
uv run python tools/export_fp16.py \
--input models/rf-detr-nano.onnx \
--output models/rf-detr-nano_fp16.onnx \
--keep-io-types
Mixed-Precision (GPU only):
uv run python tools/export_fp16.py \
--input models/rf-detr-nano.onnx \
--mixed-precision \
--sample-input sample_input.npy \
--keep-io-types
Parameters¶
| Argument | Default | Description |
|---|---|---|
--input |
Required | Path to the input .onnx model |
--output |
(input_stem)_fp16.onnx | Path for the output ONNX model |
--keep-io-types |
False |
Keep model inputs/outputs as float32 (recommended) |
--mixed-precision |
False |
Use auto mixed-precision (requires GPU + --sample-input) |
--sample-input |
None |
Path to a .npy file with sample input (for mixed-precision) |
--op-block-list |
DEFAULT |
List of op types to leave as float32 |
--node-block-list |
None |
List of node names to leave as float32 |
Why use FP16?
FP16 models are half the size and offer significantly faster inference on modern GPUs (TensorRT, CUDA). Use --keep-io-types to ensure the model interface remains float32 for easier integration with existing preprocessing pipelines.
Supported Model Types¶
| Type | Parameters | Input Size | Notes |
|---|---|---|---|
nano |
~6M | 384×384 | Fastest, recommended for edge/CPU |
small |
~12M | 384×384 | Good speed/accuracy balance |
base |
~32M | 560×560 | Strong accuracy |
medium |
~50M | 560×560 | High accuracy |
large |
~100M+ | 560×560 | Highest accuracy, GPU recommended |