Build iOS Vision API Demos: OCR, Pose, Barcodes in SwiftUI

Use Apple's on-device Vision API for fast, private text recognition, rectangle detection, body pose estimation, and barcode scanning—clone the GitHub repo, follow the core request-handler pattern, and integrate with live camera feeds in SwiftUI for production-ready apps.

Core Vision Request Pattern Powers All Demos

Apple's Vision framework processes images on-device for speed and privacy, supporting OCR, rectangles, barcodes, body pose, and more. Every demo uses this reusable pattern: create a VNImageRequestHandler from a CGImage, perform a specialized VNRequest, and handle results in a completion block dispatched to the main queue.

import Vision
import UIKit
func performVision(_ cgImage: CGImage, request: VNRequest) throws {
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try handler.perform([request])
}

ViewModels subclass ObservableObject with a lazy VNRequest configured once: set properties like recognitionLevel = .accurate for OCR (or .fast for higher FPS), maximumObservations = 5, minimumAspectRatio = 0.3, minimumSize = 0.2 for rectangles, or filter pose keypoints by confidence > 0.2. Parse results with compactMap: for OCR, extract topCandidates(1).first?.string and confidence; for barcodes, payloadStringValue; for pose, map recognizedPoint(jointName).location. This keeps code DRY across features.

Throttle to every 3–5 frames for live camera stability, apply temporal filters (e.g., moving average on pose keypoints), and convert Vision's normalized boundingBox or location to SwiftUI Path overlays using view frame scaling.

Key Feature Implementations with Configs and Parsing

Text Recognition (OCR): VNRecognizeTextRequest with automaticallyDetectsLanguage = true, usesLanguageCorrection = true. Results: array of (text: String, confidence: Float). Visualize with Swift Charts BarMark on confidence scores via [TextConfidence] model.

Rectangle Detection: VNDetectRectanglesRequest limits to 5 observations, min aspect 0.3, size 0.2. Results: [VNRectangleObservation] for document scanning overlays.

Body Pose: VNDetectHumanBodyPoseRequest extracts first observation's keypoints for all JointName.allCases above 0.2 confidence. Best on live back-camera feeds with good lighting/distance; use for fitness or gestures.

Barcode/QR: VNDetectBarcodesRequest yields [String] payloads. Works on supported types; optimize by closing distance and improving focus/contrast.

Target iOS 16+, add NSCameraUsageDescription for permissions. Simulator handles static images; physical device required for live capture.

Live Camera Integration and SwiftUI Structure

CameraSession wraps AVCaptureSession (high preset, back wide-angle): sets AVCaptureVideoDataOutput delegate to callback onBuffer: (CVPixelBuffer) -> Void. Convert buffers to CGImage via CIContext.createCGImage(CIImage(cvPixelBuffer:), from: extent).

Hook ViewModels: camera.onBuffer = { pb in if let cg = cgImage(from: pb) { vm.recognize(from: cg) } }. Preview with CameraPreview UIViewRepresentable using AVCaptureVideoPreviewLayer(.resizeAspectFill).

App structure: HomeMenuView NavigationStack with List links to feature views (e.g., TextRecognitionView with ImagePicker sheet or live camera). Each view binds @StateObject var vm, lists results with confidence, and overlays paths.

Troubleshoot: main-thread layer adds for previews, pre-construct requests, test varied lighting. Repo at https://github.com/sanjaynela/visionApiProject provides full Xcode project (16+), Sources/Camera/Vision/UI/Charts folders for immediate forking.

Summarized by x-ai/grok-4.1-fast via openrouter

6841 input / 2109 output tokens in 14285ms

© 2026 Edge