model-training-and-deployment

Train and Deploy YOLOv8n on STM32N6

Install the Ultralytics environment
Follow the official instructions. The example below uses Docker:
```
sudo docker pull ultralytics/ultralytics:latest-export
```

Launch the container (GPU mode)

sudo docker run -it --ipc=host --runtime=nvidia --gpus all \
  -v ./your/host/path:/ultralytics/output \
  ultralytics/ultralytics:latest-export /bin/bash

Verify the environment

yolo detect predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg' device=0

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ━━━━━━━━━━━━ 6.2MB 1.6MB/s 4.0s
Ultralytics 8.3.213 🚀 Python-3.11.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA A800 80GB PCIe, 81051MiB)
YOLOv8n summary (fused): 72 layers, 3,151,904 parameters, 0 gradients, 8.7 GFLOPs

Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg': 100% ━━━━━━━━━━━━ 134.2KB 1.2MB/s 0.1s
image 1/1 /ultralytics/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 64.5ms
Speed: 4.9ms preprocess, 64.5ms inference, 117.6ms postprocess per image at shape (1, 3, 640, 480)
Results saved to /ultralytics/runs/detect/predict
💡 Learn more at https://docs.ultralytics.com/modes/predict

Train and export

a. Train (optional)

# Fine-tune from the Coco-pretrained weights
yolo detect train data=data.yaml model=yolov8n.pt epochs=100 imgsz=256 device=0
# Or train from scratch
yolo detect train data=data.yaml model=yolov8n.yaml epochs=100 imgsz=256 device=0

b. Export to TFLite

yolo export model=yolov8n.pt format=tflite imgsz=256 int8=True data=data.yaml fraction=0.2

TensorFlow SavedModel: export success ✅ 35.3s, saved as 'yolov8n_saved_model' (40.1 MB)

TensorFlow Lite: starting export with tensorflow 2.19.0...
TensorFlow Lite: export success ✅ 0.0s, saved as 'yolov8n_saved_model/yolov8n_int8.tflite' (3.2 MB)

Export complete (35.4s)
Results saved to /ultralytics
Predict:         yolo predict task=detect model=yolov8n_saved_model/yolov8n_int8.tflite imgsz=256 int8
Validate:        yolo val task=detect model=yolov8n_saved_model/yolov8n_int8.tflite imgsz=256 data=coco.yaml int8
Visualize:       https://netron.app
💡 Learn more at https://docs.ultralytics.com/modes/export

Quantize the model

a. Download the quantization script and config file: tflite_quant.py and user_config_quant.yaml.

b. Download the calibration dataset coco8.

c. Edit user_config_quant.yaml:

model:
    name: yolov8n_256
    uc: od_coco
    model_path: ./yolov8n_saved_model
    input_shape: [256, 256, 3]
quantization:
    fake: False
    quantization_type: per_channel
    quantization_input_type: uint8 # float
    quantization_output_type: int8 # float
    calib_dataset_path: ./coco8/images/val # Calibration set (can reuse part of the training set)
    export_path: ./quantized_models
pre_processing:
     rescaling: {scale : 255, offset : 0}

d. Run quantization

# Install dependencies
pip install hydra-core munch
# Quantize
python tflite_quant.py --config-name user_config_quant.yaml

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1760435045.538259     834 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1760435045.538286     834 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
I0000 00:00:1760435045.567572     834 mlir_graph_optimization_pass.cc:425] MLIR V1 optimization pass is not enabled
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:05<00:00,  5.46it/s]
fully_quantize: 0, inference_type: 6, input_inference_type: UINT8, output_inference_type: INT8
Quantized model generated: yolov8n_256_quant_pc_ui_od_coco.tflite

e. Optional: Evaluate the quantized model

yolo val model=./quantized_models/yolov8n_256_quant_pt_ui_od_coco.tflite data=coco.yaml imgsz=256

Deploy the model

a. Clone the AICam project

git clone https://github/camthink/aicam

b. Generate the model firmware package (install stedgeai.exe first)

cd aicam/Model
cp /your/path/quantized_models/yolov8n_256_quant_pc_ui_od_coco.tflite weights/

# Create the relocatable model
./generate-reloc-model.sh -m weights/yolov8n_256_quant_pc_ui_od_coco.tflite \
  -f yolov8_od@neural_art_reloc.json \
  -o network_rel_yolov8_od.bin

# Build the package
python model_packager.py create \
  --model network_rel_yolov8_od.bin \
  --config weights/yolov8n_256_quant_pc_ui_od_coco.json \
  --output ../bin/model_yolov8_od_coco.bin

c. Deploy to the device (choose one)

ST-LINK flashing

../Script/maker.sh flash ../bin/model_yolov8_od_coco.bin 0x70900000

Web OTA

Upload model_yolov8_od_coco.bin through the NE301 Web UI to update the model in place.

Train and Deploy YOLOv8n on STM32N6​

Train and Deploy YOLOv8n on STM32N6