On-Device Vision: Swift Code for OCR, Poses, Barcodes

Reusable Pattern for All Vision Requests

Apple's Vision framework processes images offline using VNImageRequestHandler and specific request types. Start with a UIImage, convert to CGImage, then perform requests asynchronously:

func processImage(from image: UIImage, with request: VNRequest) {
    guard let cgImage = image.cgImage else { return }
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try? handler.perform([request])
}

Handle results in the request's completion block by casting to the observation type (e.g., VNRecognizedTextObservation). This pattern supports live camera feeds for real-time detection, avoiding cloud latency.

Text, Rectangle, Barcode, and Pose Detection Examples

Text Recognition (OCR): Detects printed/handwritten text. Use VNRecognizeTextRequest; extract topCandidate.string and confidence from VNRecognizedTextObservation.

let request = VNRecognizeTextRequest { request, error in
    guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
    for observation in observations {
        if let topCandidate = observation.topCandidates(1).first {
            print("Detected text: \(topCandidate.string)")
        }
    }
}

Rectangle Detection: For document scanning. VNDetectRectanglesRequest yields VNRectangleObservation.boundingBox.

Barcode Detection: Supports QR, UPC, EAN. VNDetectBarcodesRequest provides VNBarcodeObservation.payloadStringValue.

Human Body Pose: Tracks 14+ joints for fitness/AR. VNDetectHumanBodyPoseRequest returns VNHumanBodyPoseObservation; query points like .leftWrist.location via recognizedPoint(.leftWrist).

Each prints results (e.g., coordinates, strings); integrate into SwiftUI/UIKit for overlays.

Visualize Detection Confidence with Swift Charts

Pair Vision outputs with SwiftUI Charts for dashboards. Model data:

struct TextConfidence: Identifiable {
    let id = UUID()
    let text: String
    let confidence: Double
}

Chart view:

import Charts
struct ConfidenceChart: View {
    var data: [TextConfidence]
    var body: some View {
        Chart(data) {
            BarMark(x: .value("Text", $0.text), y: .value("Confidence", $0.confidence))
        }.frame(height: 300).padding()
    }
}

Populate from OCR: append TextConfidence(text: candidate.string, confidence: candidate.confidence) for each observation. This turns raw detections into actionable, visual insights without external libraries.

Trade-offs Favor Vision Over Cloud APIs

Vision excels in speed (no network), privacy (on-device), and cost (free) versus OpenAI/Google Cloud Vision. Drawbacks: limited to Apple's models (extend via Core ML); iOS/iPadOS only. Ideal for production apps needing reliability—no rate limits or tokens. GitHub repo at https://github.com/sanjaynela/visionApiProject.git expands these into a full project.