Custom YOLO model not showing bounding box predictions on ONNX-HailoRT

Completed

Ael Lee

February 27, 2025 16:17

Summary

Custom YOLOv10 ONNX model created with these steps fails to produce object detection bounding boxes when using ONNX-HailoRT runtime on Raspberry Pi with Nx AI Manager, while the Nx Demo model works correctly on the same runtime.

Environment

Nx Meta Version: Desktop Client version 6.0.1.40074 R1 (449f8e5305cb).
Built for macosx-arm64 with clang.
OpenGL version: 4.1 Metal - 89.3.
OpenGL renderer: Apple M1 Pro.
OpenGL vendor: Apple.
OpenGL max texture size: 16384.
Server raspberrypi (192.168.209.105): v6.0.1.39873
Nx AI Manager Version: 4.3.10
Plugin version: 4.3.10 ONNX-HailoRT
Platform: linux_arm64 debian 12

Reproduction Scenario

Train YOLOv10 object detection model. As we are in our POC stage, we exported a YOLOv10 model directly using the CLI command in this repo. Then, we updated the complete_onnx.py script to add the classes your model is trained on.
Enable the Nx AI Manager plugin and upload the model to the Nx AI plugin admin server management.
Assign the model to our camera

Actual Behavior

No object detection bounding boxes appear around cars in the camera feed.
Logs show "Warning: Routing to a model that does not exist" errors.
When using the same model with the Nx CPU runtime (not Hailo), alerts appear in the right pane but no bounding boxes are drawn.

Expected Behavior

Bounding boxes should appear around detected cars, similar to what we observe when using the Nx Demo model "80-Classes Object Detector [640x640]" with the same Hailo runtime.

Additional Information

When we ran the YOLOv10 model on the Nx CPU runtime, the alerts were coming through on the right pane, but no bounding boxes were drawn. However, we must develop our system on the Hailo Runtime.
Here is the logging information from the Raspberry Pi with the Hailo runtime and our custom YOLO model

twisthink@raspberrypi:~ $ tail /opt/networkoptix-metavms/mediaserver/bin/plugins/nxai_plugin/nxai_manager/etc/sclblmod_start.log  -n 10
MODULE: 1740602649831 000000029: Warning: Routing to a model that does not exist.
MODULE: 1740602649831 000000006: Input object released
MODULE: 1740602649831 000000002: Notice: Sclbl module start inference.
MODULE: 1740602649831 000000002: Informational: Checking for input message.
MODULE: 1740602649831 000000002: Informational: Waiting for input message.
MODULE: 1740602649890 000058065: Informational: Input received 0.
MODULE: 1740602649890 000000102: Warning: Routing to a model that does not exist.
MODULE: 1740602649890 000000022: Input object released
MODULE: 1740602649890 000000004: Notice: Sclbl module start inference.
MODULE: 1740602649890 000000003: Informational: Checking for input message.

Here is the logging information from the Raspberry Pi with the Nx Demo model with the Hailo runtime

twisthink@raspberrypi:~ $ tail /opt/networkoptix-metavms/mediaserver/bin/plugins/nxai_plugin/nxai_manager/etc/sclblmod_start.log -n 20
MODULE: 1740603399292 000000011: Notice: Converting bboxes to image space 
MODULE: 1740603399293 000000740: Input object released
MODULE: 1740603399293 000000011: Sending result message of length: 171
MODULE: 1740603399293 000000098: Notice: Sclbl module start inference.
MODULE: 1740603399293 000000044: Informational: Checking for input message.
MODULE: 1740603399293 000000007: Informational: Waiting for input message.
MODULE: 1740603399317 000023621: Informational: Input received 0.
MODULE: 1740603399317 000000177: Notice: sclbl_module_video_run_single_inference.
MODULE: 1740603399317 000000006: Notice: Preprocessing.
MODULE: 1740603399319 000002314: Notice: start inference.
MODULE: 1740603399322 000002387: Notice: Sending signal to 0
MODULE: 1740603399322 000000022: Notice: Waiting for signal from 0
MODULE: 1740603399340 000018455: Notice: Received response signal from 0
MODULE: 1740603399340 000000167: Notice: after inference.
MODULE: 1740603399340 000000004: Notice: after sclbl_set_ts_model_end.
MODULE: 1740603399340 000000003: Notice: Model run   540 
MODULE: 1740603399340 000000002: Notice: Postprocesssing [1]
MODULE: 1740603399340 000000024: Notice: Converting bboxes to image space 
MODULE: 1740603399341 000000793: Input object released

While reading the scailable repository on converting ONNX models to an Nx compatible format, I found a mismatch. In the top-level README.md, it is written that ONNX version 1.15.0 with Python version 3.11, and an ONNX runtime version of 1.17.0 is needed. However, in the requirements.txt for the instructions on YOLOv10 to ONNX, the listed versions are different. I have made sure I am using Python 3.11.

Specific Questions

What makes the Nx Demo model compatible with the Hailo Runtime while our custom model fails?
Are there specific ONNX model optimizations or configurations required for Hailo compatibility?
Which specific versions of ONNX, Python, and ONNX runtime should we use when preparing models for Hailo?
Is there additional debugging we can perform to identify the root cause?

Thanks.

Comments

6 comments

Ayoub Assis

Network Optix team

March 04, 2025 14:59
Hi Ael,
Thanks for the detailed report of the issue you're encountering.
Firstly, let me answer the general questions, then I'll address the issues you're running into with your POC.
1. What makes the Nx Demo model compatible with the Hailo Runtime while our custom model fails?
  Currently, the Nx AI Cloud doesn't have an automated process to convert any uploaded ONNX to Hailo's ONNX format, mainly due to Hailo dataflow compiler requiring some inputs that vary based on the model. It's currently only capable of compiling Teachable Machine models for Hailo chips. We're still working on way to extend the set of model architectures that can be automatically exported to Hailo.
  Given this limitation, we provide a workaround for deploying models on Hailo chips by directly uploading the Hailo compiled model to the Nx AI cloud instead of the regular ONNX (or compress them together in a zip file and upload it to be able to deploy on both CPU/GPU and Hailo as well).
  You can find a tutorial with code that was used to compile one of the Nx demo models for Hailo chips here.
2. Are there specific ONNX model optimizations or configurations required for Hailo compatibility?
  The Nx demo models that we compile for Hailo don't require optimizations to be compatible for Hailo. I'm not sure about other models.
  Regarding the configuration, when compiling a model for Hailo you will probably need to specify the start_node_names and end_node_names. Please refer to Hailo dataflow compiler user guide for more details about the compilation configuration.
3. Which specific versions of ONNX, Python, and ONNX runtime should we use when preparing models for Hailo?
  If you're referring to Python and ONNX versions required to export the trained Yolov10 from PyTorch to ONNX as detailed here. I've just updated the requirements.txt file and tested it using Python 3.10. Let me know if things don't work as expected.
  But if you're referring to compiling (or preparing as you said) model for Hailo, you can checkout the developer zone in hailo.
With that said, to help us figure out what's going wrong, could you please run the script in this page and send us the archive it generates. Also can you share the UUID of Yolov10 model to look into why the boxes don't show up when you run in on CPU. Once the latter is resolved, you can then compile the model for Hailo, upload it, then deploy it on the Hailo chip.
0

Ael Lee

March 05, 2025 18:05 Edited

Here is the content of the ai_manager_run.txt file. There are also many other files that were generated, but I am unable to attach them to the comments:

MODULE: 1741121280670 000000202: Notice: retrieving possible post-processing options.
MODULE: 1741121280670 000000134: Created new SHM for 0 : 524350
MODULE: 1741121280670 000000179: Starting inference engine with: 1 1
MODULE: 1741121280671 000000526: Notice: Starting inference engine with pid: 134256.
MODULE: 1741121280671 000000009: Notice: Module is waiting for ready signal from 0
MODULE: 1741121280776 000104566: Informational: Received ready signal from 0
MODULE: 1741121280776 000000010: Inference engine 0 ready.
MODULE: 1741121280776 000000117: Notice: Starting thread 0
MODULE: 1741121280776 000000021: Informational: Module socket path [/tmp/nxai_manager.sock]
MODULE: 1741121280776 000000031: Notice: Waiting for thread 0 to finish.
MODULE: 1741121280776 000000009: Notice: Thread 0 started.
MODULE: 1741121280776 000000004: Notice: Sclbl module start inference.
MODULE: 1741121280776 000000004: Informational: Checking for input message.
MODULE: 1741121280776 000000004: Informational: Waiting for input message.
MODULE: 1741121280809 000033699: Informational: Input received 0.
MODULE: 1741121280810Error! Failed to send signal to 0. Possibly inference engine crashed.
 000000277: Notice: sclbl_module_video_run_single_inference.
MODULE: 1741121280810 000000074: Notice: Preprocessing.
MODULE: 1741121280814 000003966: Notice: start inference.
MODULE: 1741121280814 000000089: Notice: New SHM ID for 0 is 524355 with capacity: 1228878
MODULE: 1741121280815 000001422: Notice: Sending signal to 0
MODULE: 1741121280815 000000083: Error! Failed to send signal to 0. Possibly inference engine crashed.
MODULE: 1741121280815 000000080: Notice: after inference.
MODULE: 1741121280815 000000003: Notice: after sclbl_set_ts_model_end.
MODULE: 1741121280815 000000003: Warning: Failed to run sclbl_module_client_inference.
MODULE: 1741121280816 000000127: Input object released
MODULE: 1741121280816 000000048: Informational: Waiting for socket listener to exit.
MODULE: 1741121280838 000022295: Notice: Socket listener exited.
MODULE: 1741121280838 000000008: Informational: Sclbl module has been stopped.
MODULE: 1741121280838 000000150: Notice: Waiting for inference engine 134256.
MODULE: 1741121280838 000000036: Inference engine exited 0
MODULE: 1741121280838 000000009: Finalizing logging.

Nick Bedbury

March 18, 2025 18:16
Hello, I am working with Ael on this.
Is there any update to the request? Can we provide any more detail to advance this troubleshooting?
0
Ayoub Assis

Network Optix team

March 19, 2025 10:29
Hi Nick,
We've asked Ael for more information privately. I'll loop you in.
0
Norman

Network Optix team

April 22, 2025 08:22
This topic was converted into a support ticket.
0
Ichiro

Network Optix team

April 24, 2025 02:01 Edited
Nick came up with a solution.
He ended up getting this working by starting with an ONNX file from the Scailable Model Zool rather than Ultralytics. With that starting point, I got the YOLOv8s-640x640 model compiled, uploaded, and running on target hardware with a Hailo8 chip.
0