A convolutional neural network trained on a desktop GPU using full floating-point precision was deployed to an ESP32 microcontroller running INT8-quantized inference. This project validates functional correctness and decision equivalence after quantization and embedded deployment.
Floating-Point Training
Desktop GPU baseline model developed in full precision
INT8 Quantization
Post-training quantization applied for embedded deployment
Equivalence Validation
Verification that quantized model preserves decision boundaries across GPU and embedded hardware
Goal: Confirm functional correctness, not performance optimization.
ESP32 microcontrollers operate under significant hardware limitations: restricted program memory, no floating-point acceleration, and limited computational bandwidth. Large-scale image-based evaluation on-device is impractical. Evaluation strategy reflects real-world embedded ML constraints.
Hardware Reality
- • Limited program memory
- • No hardware floating-point acceleration
- • Single-threaded execution
- • Fixed-point arithmetic only
Evaluation Strategy
- • Full statistical evaluation on GPU
- • Representative subset on both platforms
- • Decision equivalence validation
- • Quantization behavior inspection
This approach prioritizes correctness verification over scale. Embedded deployment success is measured by decision consistency, not dataset coverage.
Training: Floating-Point CNN
Convolutional neural network trained on desktop GPU using standard backpropagation. Full 32-bit floating-point precision throughout training and validation.
• Input: 32x32 colour images
• Architecture: Conv layers → Pooling → Dense → Output (3 classes)
• Precision: FP32 throughout
Post-Training Quantization: INT8
After training, model weights and activations are quantized to 8-bit integers. Scaling factors are computed per-layer to minimize information loss. No retraining applied.
• Quantization scheme: Linear (affine mapping)
• Bit-width: 8-bit (int8 range: -128 to 127)
• Model size reduction: ~71% (3.56× compression)
Evaluation: Quantized Model on Both Platforms
Same INT8-quantized model is evaluated on GPU (for comprehensive statistics) and on ESP32 (for embedded correctness validation).
GPU Evaluation (INT8)
- ✓ Full test set: 427 images
- ✓ Establishes performance baseline
- ✓ Confusion matrix: ground truth
- ✓ Execution: parallel, optimized
ESP32 Evaluation (INT8)
- ✓ Representative subset: 50 images
- ✓ Validates embedded deployment
- ✓ Checks decision equivalence
- ✓ Execution: serial, hardware-native
Validation: Decision Equivalence
Predictions on representative subset match perfectly across GPU and ESP32. Identical confusion matrices confirm successful deployment. Quantization artifacts do not break decision logic.
Result: Embedded INT8 deployment validated. Model decisions are consistent and reliable on ESP32.
GPU Full Test Set Evaluation
Complete test set (100% of evaluation data) processed on GPU with INT8 quantization applied. This establishes the ground truth performance baseline and statistical accuracy across the full problem domain.
True model performance: measured on complete dataset
Representative Subset Evaluation
A carefully selected subset of the full test set is evaluated on both GPU (INT8) and ESP32 (INT8). This enables direct comparison of predictions across platforms without overwhelming embedded memory constraints.
Deployment validation: identical predictions confirm equivalence
Key Validation Principle
Identical prediction counts across GPU (INT8) and ESP32 (INT8) on the representative subset confirm successful quantized deployment. Mismatch would indicate hardware-specific inference behavior requiring investigation.
Full Test Set
| True \ Pred | Rock | Paper | Scissors |
|---|---|---|---|
| Rock | 124 | 1 | 3 |
| Paper | 0 | 146 | 0 |
| Scissors | 3 | 0 | 150 |
Predicted →
True Label ↓
This matrix reflects full statistical performance across 427 test samples. Counts represent prediction distribution on GPU.
Subset Equivalence
| True \ Pred | Rock | Paper | Scissors |
|---|---|---|---|
| Rock | 13 | 0 | 0 |
| Paper | 0 | 16 | 0 |
| Scissors | 1 | 0 | 20 |
Predicted →
True Label ↓
Interpretation: This confusion matrix represents 50 samples from the representative subset evaluated identically on GPU and ESP32 with INT8 quantization applied to both. Identical prediction counts confirm functional equivalence after quantization and embedded deployment.
Validation Status: Predictions match across platforms, validating that the quantized model maintains decision integrity in embedded execution.
Representative subset of ESP32 inference logs (shown: 20 of 50). Each line logs per-image predictions with confidence, latency, and match status (✓ correct, ✗ mismatch).Confidence values are shown for inspection only and are not used in confusion matrix metrics. Classification decisions are determined by argmax of INT8 quantized outputs.
Quantization Works—With Validation
Quantized CNNs can preserve decision boundaries when deployed correctly. Post-training INT8 quantization reduces model size dramatically while maintaining classification integrity across platforms.
Embedded ML Prioritizes Correctness
Embedded evaluation focuses on decision equivalence and correctness, not scale or speed. Representative subset validation is sufficient to confirm deployment success—statistical completeness is secondary to functional verification.
Deployment Discipline Matters
This project demonstrates end-to-end deployment discipline: from training to quantization to hardware validation. Success is not measured by achieving near-research performance on limited hardware, but by systematically proving correctness through careful measurement and documentation.
Conclusion: Embedded machine learning is not a constrained form of research ML, but a separate engineering discipline defined by different assumptions, failure modes, and validation strategies. This project shows that, when treated as such, production-quality CNN inference on microcontrollers is not only feasible, but reproducible, reliable, and auditable.