Quick Start¶

This page shows the most common usage patterns to get you running inference quickly.

1. Prepare a Model¶

Download a pre-converted ONNX model from Hugging Face and place it in models/:

models/
└── rf-detr-nano/
    └── rf-detr-nano.sim.onnx

GPU users: convert to FP16 for best performance

Converting your model to FP16 (float16) roughly halves model size and significantly boosts GPU throughput. Run once after downloading/exporting:

uv run python tools/export_fp16.py \
    --input models/rf-detr-nano/rf-detr-nano.sim.onnx \
    --keep-io-types

See Tools → export_fp16.py for all options.

2. CLI Inference¶

Run inference on a single image from the command line:

CPUGPU

uv run python python/inference.py \
    --model models/rf-detr-nano/rf-detr-nano.sim.onnx \
    --image assets/drone.jpg \
    --device cpu

uv run python python/inference.py \
    --model models/rf-detr-nano/rf-detr-nano.sim.onnx \
    --image assets/drone.jpg \
    --device gpu

Expected output:

--- Inference Results ---
Preprocessing:  3.12 ms
ORT Run:        12.45 ms
Postprocessing: 1.87 ms
---------------------------------
Processing (Pre+ORT+Post): 17.44 ms
Processing FPS:           57.34
---------------------------------
Total Latency (inc. I/O):  52.11 ms
Total FPS:                19.19
---------------------------------
Detections saved to: output/output.jpg

3. Python API¶

Basic Usage¶

from modules.model import RFDETRModel

# Initialize (auto-selects best provider: TensorRT > CUDA > CPU)
model = RFDETRModel("models/rf-detr-nano/rf-detr-nano.sim.onnx", device="gpu")

# Run inference from a file path or URL
detections, timings = model.predict("assets/drone.jpg")

# Print results
for det in detections:
    print(f"Label: {det.label}, Score: {det.score:.2f}, Box: {det.unnormalized_box}")

Using an OpenCV Image¶

import cv2
from modules.model import RFDETRModel

model = RFDETRModel("models/rf-detr-nano/rf-detr-nano.sim.onnx", device="cpu")

image = cv2.imread("assets/drone.jpg")  # BGR format
detections, timings = model.predict(image, confidence_threshold=0.4)

# Save annotated image
model.save_detections(image, detections, "output/result.jpg")

Timing Breakdown¶

Every predict() call returns timing information in milliseconds:

detections, timings = model.predict(image)

print(f"Preprocess:  {timings['preprocess']:.2f} ms")
print(f"ORT Run:     {timings['ort_run']:.2f} ms")
print(f"Postprocess: {timings['postprocess']:.2f} ms")
print(f"Total:       {timings['total']:.2f} ms")

4. C++ Quick Start¶

After building (see Installation):

# Image inference
./rfdetr_image_inference \
    --model ../../models/rf-detr-nano/rf-detr-nano.sim.onnx \
    --image ../../assets/drone.jpg \
    --device gpu

# Video inference
./rfdetr_video_inference \
    --model ../../models/rf-detr-nano/rf-detr-nano.sim.onnx \
    --video ../../assets/sample.mp4 \
    --device gpu

5. Validate Against PyTorch¶

Verify that the ONNX model produces results matching the original PyTorch model:

# Run tests using GPU (default)
bash tests/run_tests.sh

# Run tests on CPU
bash tests/run_tests.sh -d cpu

# Pass a custom ONNX Runtime root dir (e.g. when using a manually installed ORT)
bash tests/run_tests.sh -d gpu -o /path/to/onnxruntime

Next Steps¶

Python API Reference — Full documentation of all classes and functions
C++ API Reference — C++ library documentation
Benchmarks — Measure and compare performance
Tools — Export your own fine-tuned model