While the results of the experiment might be valuable in informing a proper project, this is beyond my capacity at this time. It is provided as is without warrenty or guarentee of reproducibility.
YOLOv12n (attention-centric object detector) → ZXing (C++/WASM barcode decoder) or rqrr (rust/WASM QR decoder) Tested and trained against Kipukas’ anti-cheat camouflaged QR codes on mobile browsers.
This directory contains the complete experiment exploring whether YOLOv12’s attention mechanism, trained on a custom QR code dataset, could improve QR detection speed and accuracy for Kipukas’ camouflaged QR codes on low-res front-facing cameras. This was tested on google pixel 3a, google pixel 4a, google pixel 9, iphone 6s, iphone 14 pro, samsung galaxy s21, and samsung galaxy book 3. YOLOv12n displays good tracking on all capable devices. General detection was best with ZXing compiled to WASM in good environmental conditions. Additionally, it is compatible with the older devices and devices which do not yet support WebGPU. In poor conditions, YOLOv12n driven cropping + adaptive threshold preprocessing yelds positive detection results with both ZXing and rqrr backends. While not as performant as ZXing, YOLOv12n + rqrr + at_21 was able to decode reliably and would be judged sufficient for more standard QR workflows. However, its main benefit (small compared to ZXing) is deminished when paired with YOLOv12n (~10x larger than ZXing).
| Approach | Result |
|---|---|
| rqrr alone (28 preprocessing strategies on full frame) | ⚠️ very slow w. adaptive threashold, Fails wo.— finder pattern detection can’t see through SVG camouflage |
| ZXing-only (std-WASM/CDN) (no YOLO, full-frame scan) | ⚠️ very slow/fails to detect on most devices |
| YOLOv12n + rqrr | ⚠️ slow but functional on powerful devices |
| YOLO v12n + ZXing (two-stage: detect → crop → decode) | ✅ Works — YOLO learns camouflage patterns, ZXing decodes clean crops, bogs older devices |
| YOLO on WASM/CPU | ⚠️ Slow but functional on powerful devices (laptops) |
| YOLO on WebGPU | ✅ Fast on supported mobile GPUs |
| ZXing-only (gcc17, compiled in house) (no YOLO, full-frame scan) | ✅ Works very fast with close shots and good environment on all devices |
User-controlled CV toggle — Auto-detecting WebGPU capability is unreliable. Some older devices report WebGPU but perform poorly; some without WebGPU have CPUs powerful enough for WASM inference. The chip icon (⬜ off / 🟩 on) in the scanner UI lets users opt in to YOLO. Default: ZXing-only.
ZXing replaced rqrr for decode — rqrr is a Rust QR decoder, but ZXing
(C++/WASM) with tryHarder mode proved more reliable for decoded crops and has
a mature WASM distribution. rqrr remains in the repo for reference.
Square 640×640 capture — Camera canvas matches YOLO’s native input resolution. No letterbox distortion, what the user sees is exactly what the decoder receives.
Otsu removed from augmentation — Otsu’s global threshold washes out the reflective surfaces on physical cards, producing all-white training images. 9 transforms remain (adaptive_thresh ×3, CLAHE, blur+AT, contrast_stretch+AT, yellow-aware, gaussian noise, JPEG compression).
5-second eager preload — ONNX model + ZXing WASM load asynchronously 5s after page load, so the scanner feels instant when opened.
Camera Frame (640×640 RGBA, 1:1 square)
│
├─── CV OFF (default) ──────────────────────┐
│ ▼
│ ┌─────────────────┐
│ │ ZXing Decode │
│ │ Full-frame scan │
│ └────────┬────────┘
│ │
├─── CV ON (user toggle) ───┐ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Stage 1: YOLOv12n │ │
│ │ ONNX Runtime Web │ │
│ │ (WebGPU → WASM) │ │
│ │ ~30-80ms (GPU) │ │
│ │ ~1-4s (CPU) │ │
│ └───────────┬───────────┘ │
│ │ bbox crop │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Stage 2: ZXing │ │
│ │ Decode cropped ROI │ │
│ │ tryHarder mode │ │
│ └───────────┬───────────┘ │
│ │ │
└───────────────────────────┴────────────────┘
│
▼
decoded URL → WASM server
→ validation → redirect
The single-stage approach (rqrr with preprocessing on full frames) fails because rqrr/ZXing finder pattern detection can’t locate QR codes through Kipukas’ cracked-lava SVG camouflage texture. No amount of image preprocessing fixes a decoder that can’t see the three position squares in a noisy full-resolution frame.
YOLOv12n learns what camouflaged QR codes look like. Its Area Attention mechanism gives it a global receptive field — it understands the whole region contextually, not just edges and corners. Once YOLO provides a tight bounding box, ZXing gets a clean, high-effective-resolution crop where decode becomes highly reliable.
kipukas-qr-dataset-70imgs/) — annotated in the
custom annotator (annotator/index.html), captured at 1280×720 from the
scanner’s front-facing camera with real printed camouflaged cardsdata/) — ~600 general QR code images for
diversity, fetched via train/fetch_dataset.pytrain/augment.py:
at15 — adaptive threshold (block=15, c=8)at11 — adaptive threshold fine (block=11, c=6)at21 — adaptive threshold coarse (block=21, c=10)clahe — CLAHE (4×4 tiles, clip=2.0)blur_at — Gaussian blur + adaptive thresholdstretch_at — contrast stretch + adaptive thresholdyellow — yellow-aware channel (max(R,G) - B) for anti-camouflagenoise15 — Gaussian noise (σ=15)jpeg35 — JPEG compression artifacts (quality=35)cd YOLO_rqrr
# 1. Augment local QR + merge with kolabit
uv run python train/augment.py
# 2. Train YOLOv12n (100 epochs, MPS on Apple Silicon)
uv run python train/train.py --epochs 100 --device mps
# 3. Export to ONNX (opset 12, WebGPU compatible)
uv run python train/export_onnx.py --weights /Users/lah-rb/Repos/lah-rb.github.io/runs/detect/runs/detect/train/weights/best.pt
# 4. Copy model to site assets
cp models/yolo12n-qr.onnx ../assets/js-wasm/yolo12n-qr.onnx
uv run python train/validate.py
| File | Purpose |
|---|---|
assets/js/yolo-inference.js |
ONNX Runtime Web session (WebGPU → WASM fallback) |
assets/js/postprocess.js |
YOLO output → bboxes (NMS, confidence threshold) |
assets/js/zxing-decode.js |
ZXing C++/WASM barcode decoder |
assets/js/kipukas-worker.js |
Web Worker orchestrating YOLO+ZXing or ZXing-only |
assets/js/kipukas-api.js |
5s delayed PRELOAD_QR, CV preference relay |
assets/js/qr-camera.js |
Camera capture, frame relay, bbox overlay |
kipukas-server/src/routes/qr.rs |
Scanner UI HTML (Rust/WASM), CV toggle button |
assets/js-wasm/yolo12n-qr.onnx |
Exported YOLO model (~5MB) |
User taps chip icon in scanner UI
→ Alpine toggles cvOn state
→ localStorage.setItem('kipukas-cv-enabled', true/false)
→ kipukasWorker.postMessage({ type: 'SET_CV_MODE', enabled })
→ Worker updates qrMode:
ON: resets qrReady, next frame triggers YOLO init
OFF: switches to 'zxing-only' (YOLO session stays loaded but unused)
Page loads → kipukas-api.js spawns worker
→ 5s timeout fires
→ Reads localStorage('kipukas-cv-enabled')
→ Sends PRELOAD_QR { cvEnabled } to worker
→ Worker inits:
cvEnabled=true: YOLO (WebGPU→WASM) + ZXing in parallel
cvEnabled=false: ZXing only
YOLO_rqrr/
├── README.md # This file
├── pyproject.toml # Python project config (uv)
├── .python-version
│
├── train/ # Python — Training pipeline
│ ├── augment.py # Kipukas augmentation + dataset merge
│ ├── train.py # Fine-tune YOLOv12n
│ ├── export_onnx.py # Export → ONNX (opset 12)
│ ├── validate.py # Evaluate model performance
│ ├── fetch_dataset.py # Download kolabit public dataset
│ ├── dataset.yaml # Auto-generated dataset config
│ └── requirements.txt # ultralytics, torch, onnx
│
├── annotator/ # Browser-based bbox annotation tool
│ └── index.html # Capture + annotate QR bounding boxes
│
├── kipukas-qr-dataset-70imgs/ # Annotated Kipukas captures
│ ├── images/train/ # 70 JPG captures from scanner camera
│ └── labels/train/ # YOLO-format label files
│
├── data/ # kolabit dataset (gitignored)
├── data-augmented/ # Merged augmented dataset (gitignored)
│
├── rqrr-wasm/ # Rust QR decode WASM crate (reference)
│ ├── Cargo.toml
│ └── src/lib.rs
│
├── web/ # Standalone test harness
│ ├── src/
│ │ ├── yolo-inference.js
│ │ ├── postprocess.js
│ │ └── yolo-rqrr-worker.js
│ └── index.html
│
├── models/ # Exported models (gitignored)
├── scripts/
│ ├── build-rqrr-wasm.sh
│ └── integrate.sh
│
├── train_100epoch.log # Training logs
├── train_320_fp16.log
└── train_augmented.log
Per AGPL requirements, the complete QR detection component is published in this public repository alongside the Kipukas production site.