On-Device Vision: Swift Code for OCR, Poses, Barcodes

Apple's Vision framework enables fast, private computer vision on iOS—text recognition, rectangle detection, body pose tracking, and barcode scanning—with reusable Swift request handlers and SwiftUI Charts for visualization.

Reusable Pattern for All Vision Requests

Apple's Vision framework processes images offline using VNImageRequestHandler and specific request types. Start with a UIImage, convert to CGImage, then perform requests asynchronously:

func processImage(from image: UIImage, with request: VNRequest) {
    guard let cgImage = image.cgImage else { return }
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try? handler.perform([request])
}

Handle results in the request's completion block by casting to the observation type (e.g., VNRecognizedTextObservation). This pattern supports live camera feeds for real-time detection, avoiding cloud latency.

Text, Rectangle, Barcode, and Pose Detection Examples

Text Recognition (OCR): Detects printed/handwritten text. Use VNRecognizeTextRequest; extract topCandidate.string and confidence from VNRecognizedTextObservation.

let request = VNRecognizeTextRequest { request, error in
    guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
    for observation in observations {
        if let topCandidate = observation.topCandidates(1).first {
            print("Detected text: \(topCandidate.string)")
        }
    }
}

Rectangle Detection: For document scanning. VNDetectRectanglesRequest yields VNRectangleObservation.boundingBox.

Barcode Detection: Supports QR, UPC, EAN. VNDetectBarcodesRequest provides VNBarcodeObservation.payloadStringValue.

Human Body Pose: Tracks 14+ joints for fitness/AR. VNDetectHumanBodyPoseRequest returns VNHumanBodyPoseObservation; query points like .leftWrist.location via recognizedPoint(.leftWrist).

Each prints results (e.g., coordinates, strings); integrate into SwiftUI/UIKit for overlays.

Visualize Detection Confidence with Swift Charts

Pair Vision outputs with SwiftUI Charts for dashboards. Model data:

struct TextConfidence: Identifiable {
    let id = UUID()
    let text: String
    let confidence: Double
}

Chart view:

import Charts
struct ConfidenceChart: View {
    var data: [TextConfidence]
    var body: some View {
        Chart(data) {
            BarMark(x: .value("Text", $0.text), y: .value("Confidence", $0.confidence))
        }.frame(height: 300).padding()
    }
}

Populate from OCR: append TextConfidence(text: candidate.string, confidence: candidate.confidence) for each observation. This turns raw detections into actionable, visual insights without external libraries.

Trade-offs Favor Vision Over Cloud APIs

Vision excels in speed (no network), privacy (on-device), and cost (free) versus OpenAI/Google Cloud Vision. Drawbacks: limited to Apple's models (extend via Core ML); iOS/iPadOS only. Ideal for production apps needing reliability—no rate limits or tokens. GitHub repo at https://github.com/sanjaynela/visionApiProject.git expands these into a full project.

Summarized by x-ai/grok-4.1-fast via openrouter

10209 input / 1747 output tokens in 10640ms

© 2026 Edge