From GPU to ESP32: Validating CNN Inference Under Embedded Constraints

An engineering case study in quantization, deployment validation, and embedded inference correctness.

Project Overview

A convolutional neural network trained on a desktop GPU using full floating-point precision was deployed to an ESP32 microcontroller running INT8-quantized inference. This project validates functional correctness and decision equivalence after quantization and embedded deployment.

Floating-Point Training

Desktop GPU baseline model developed in full precision

INT8 Quantization

Post-training quantization applied for embedded deployment

Equivalence Validation

Verification that quantized model preserves decision boundaries across GPU and embedded hardware

Goal: Confirm functional correctness, not performance optimization.

Embedded Deployment Constraints

ESP32 microcontrollers operate under significant hardware limitations: restricted program memory, no floating-point acceleration, and limited computational bandwidth. Large-scale image-based evaluation on-device is impractical. Evaluation strategy reflects real-world embedded ML constraints.

Hardware Reality

  • • Limited program memory
  • • No hardware floating-point acceleration
  • • Single-threaded execution
  • • Fixed-point arithmetic only

Evaluation Strategy

  • • Full statistical evaluation on GPU
  • • Representative subset on both platforms
  • • Decision equivalence validation
  • • Quantization behavior inspection

This approach prioritizes correctness verification over scale. Embedded deployment success is measured by decision consistency, not dataset coverage.

Model & Quantization Pipeline
1

Training: Floating-Point CNN

Convolutional neural network trained on desktop GPU using standard backpropagation. Full 32-bit floating-point precision throughout training and validation.

• Input: 32x32 colour images

• Architecture: Conv layers → Pooling → Dense → Output (3 classes)

• Precision: FP32 throughout

2

Post-Training Quantization: INT8

After training, model weights and activations are quantized to 8-bit integers. Scaling factors are computed per-layer to minimize information loss. No retraining applied.

• Quantization scheme: Linear (affine mapping)

• Bit-width: 8-bit (int8 range: -128 to 127)

• Model size reduction: ~71% (3.56× compression)

3

Evaluation: Quantized Model on Both Platforms

Same INT8-quantized model is evaluated on GPU (for comprehensive statistics) and on ESP32 (for embedded correctness validation).

GPU Evaluation (INT8)

  • ✓ Full test set: 427 images
  • ✓ Establishes performance baseline
  • ✓ Confusion matrix: ground truth
  • ✓ Execution: parallel, optimized

ESP32 Evaluation (INT8)

  • ✓ Representative subset: 50 images
  • ✓ Validates embedded deployment
  • ✓ Checks decision equivalence
  • ✓ Execution: serial, hardware-native

Validation: Decision Equivalence

Predictions on representative subset match perfectly across GPU and ESP32. Identical confusion matrices confirm successful deployment. Quantization artifacts do not break decision logic.

Result: Embedded INT8 deployment validated. Model decisions are consistent and reliable on ESP32.

Evaluation Methodology

GPU Full Test Set Evaluation

Complete test set (100% of evaluation data) processed on GPU with INT8 quantization applied. This establishes the ground truth performance baseline and statistical accuracy across the full problem domain.

True model performance: measured on complete dataset

Representative Subset Evaluation

A carefully selected subset of the full test set is evaluated on both GPU (INT8) and ESP32 (INT8). This enables direct comparison of predictions across platforms without overwhelming embedded memory constraints.

Deployment validation: identical predictions confirm equivalence

Key Validation Principle

Identical prediction counts across GPU (INT8) and ESP32 (INT8) on the representative subset confirm successful quantized deployment. Mismatch would indicate hardware-specific inference behavior requiring investigation.

GPU Model — Full Test Dataset
Complete test set evaluation showing true model performance

Full Test Set

True \ PredRockPaperScissors
Rock12413
Paper01460
Scissors30150

Predicted →

True Label ↓

This matrix reflects full statistical performance across 427 test samples. Counts represent prediction distribution on GPU.

Subset Evaluation — GPU (INT8) & ESP32 (INT8)
Representative subset validation across both platforms

Subset Equivalence

True \ PredRockPaperScissors
Rock1300
Paper0160
Scissors1020

Predicted →

True Label ↓

Interpretation: This confusion matrix represents 50 samples from the representative subset evaluated identically on GPU and ESP32 with INT8 quantization applied to both. Identical prediction counts confirm functional equivalence after quantization and embedded deployment.

Validation Status: Predictions match across platforms, validating that the quantized model maintains decision integrity in embedded execution.

Inference Snapshot
Example ESP32 inference logs and per-image predictions
Image 1/50 | True: paper | Pred: paper | Conf: 0.9961 | Time: 39.65ms ✓
Image 2/50 | True: rock | Pred: rock | Conf: 0.9961 | Time: 39.70ms ✓
Image 3/50 | True: paper | Pred: paper | Conf: 0.9961 | Time: 39.68ms ✓
Image 4/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.67ms ✓
Image 5/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.67ms ✓
Image 6/50 | True: scissors | Pred: scissors | Conf: 0.8125 | Time: 39.70ms ✓
Image 7/50 | True: rock | Pred: rock | Conf: 0.9961 | Time: 39.68ms ✓
Image 8/50 | True: paper | Pred: paper | Conf: 0.9961 | Time: 39.67ms ✓
Image 9/50 | True: paper | Pred: paper | Conf: 0.9961 | Time: 39.68ms ✓
Image 10/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.69ms ✓
Image 11/50 | True: scissors | Pred: paper | Conf: 0.9961 | Time: 39.68ms ✗
Image 12/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.67ms ✓
Image 13/50 | True: paper | Pred: paper | Conf: 0.9961 | Time: 39.68ms ✓
Image 14/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.70ms ✓
Image 15/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.68ms ✓
Image 16/50 | True: rock | Pred: rock | Conf: 0.9961 | Time: 39.67ms ✓
Image 17/50 | True: scissors | Pred: scissors | Conf: 0.9844 | Time: 39.68ms ✓
Image 18/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.69ms ✓
Image 19/50 | True: rock | Pred: rock | Conf: 0.9961 | Time: 39.68ms ✓
Image 20/50 | True: scissors | Pred: scissors | Conf: 0.9961 | Time: 39.67ms ✓

Representative subset of ESP32 inference logs (shown: 20 of 50). Each line logs per-image predictions with confidence, latency, and match status (✓ correct, ✗ mismatch).Confidence values are shown for inspection only and are not used in confusion matrix metrics. Classification decisions are determined by argmax of INT8 quantized outputs.

Key Insights & Learnings

Engineering Takeaways

Quantization Works—With Validation

Quantized CNNs can preserve decision boundaries when deployed correctly. Post-training INT8 quantization reduces model size dramatically while maintaining classification integrity across platforms.

Embedded ML Prioritizes Correctness

Embedded evaluation focuses on decision equivalence and correctness, not scale or speed. Representative subset validation is sufficient to confirm deployment success—statistical completeness is secondary to functional verification.

Deployment Discipline Matters

This project demonstrates end-to-end deployment discipline: from training to quantization to hardware validation. Success is not measured by achieving near-research performance on limited hardware, but by systematically proving correctness through careful measurement and documentation.

Conclusion: Embedded machine learning is not a constrained form of research ML, but a separate engineering discipline defined by different assumptions, failure modes, and validation strategies. This project shows that, when treated as such, production-quality CNN inference on microcontrollers is not only feasible, but reproducible, reliable, and auditable.

Built with v0