iOS Vision API Demo: On-Device OCR, Poses, Barcodes

Clone this SwiftUI iOS app to test Apple's Vision framework locally for text recognition, rectangle detection, body pose tracking, and barcode scanning using MVVM architecture—no cloud needed.

Implement Four Core Vision Features On-Device

Build privacy-focused computer vision apps by integrating Apple's Vision framework directly into iOS. The demo processes images from camera or photo library entirely on-device for speed and data security. Key implementations:

  • Text Recognition (OCR): Use VNRecognizeTextRequest to extract text with confidence scores, visualized in SwiftUI Charts via ConfidenceChart.swift.
  • Rectangle Detection: Configure VNDetectRectanglesRequest to identify rectangular shapes in real-time.
  • Human Body Pose Detection: Track joints with VNDetectHumanBodyPoseRequest, rendering poses on detected bodies.
  • Barcode Detection: Scan multiple formats using VNDetectBarcodesRequest.

All features handle live camera feeds or static images through CameraService.swift and VisionService.swift, requesting NSCameraUsageDescription and NSPhotoLibraryUsageDescription permissions only when needed.

MVVM Architecture for Scalable Vision Apps

Structure your Vision-powered iOS app with clean separation:

MyVisionAPI/
├── Models/VisionModels.swift  # Results data
├── Services/
│   ├── VisionService.swift    # API requests
│   └── CameraService.swift    # Input handling
├── Views/
│   ├── WelcomeView.swift
│   ├── ConfidenceChart.swift
│   ├── TextRecognitionView.swift
│   ├── RectangleDetectionView.swift
│   ├── BodyPoseView.swift
│   └── BarcodeDetectionView.swift
├── ContentView.swift          # Tab navigation
└── MyVisionAPIApp.swift       # Entry point

This setup isolates Vision logic in services, keeps views declarative with SwiftUI, and uses models for structured outputs. Configure app signing in Xcode for MyVisionAPI.entitlements and build with Cmd+R.

Quick Setup and Testing Workflow

Clone repo, open in Xcode, select signing team for MyVisionAPI target, then run. Test via tabbed interface:

  1. Text: Pick image/camera, view extracted text and confidence chart.
  2. Rectangles: Detect and overlay bounding boxes.
  3. Poses: Pose estimation on human figures.
  4. Barcodes: Decode payloads instantly.

Troubleshoot builds with Cmd+Shift+K clean; check console for runtime errors. Performance stays smooth on-device. Contribute by branching git checkout -b feature/name, committing, and pushing—MIT licensed.

Summarized by x-ai/grok-4.1-fast via openrouter

4523 input / 1622 output tokens in 7742ms

© 2026 Edge