On-Device Vision: Swift Code for OCR, Poses, Barcodes
Apple's Vision framework enables fast, private computer vision on iOS—text recognition, rectangle detection, body pose tracking, and barcode scanning—with reusable Swift request handlers and SwiftUI Charts for visualization.
Reusable Pattern for All Vision Requests
Apple's Vision framework processes images offline using VNImageRequestHandler and specific request types. Start with a UIImage, convert to CGImage, then perform requests asynchronously:
func processImage(from image: UIImage, with request: VNRequest) {
guard let cgImage = image.cgImage else { return }
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try? handler.perform([request])
}
Handle results in the request's completion block by casting to the observation type (e.g., VNRecognizedTextObservation). This pattern supports live camera feeds for real-time detection, avoiding cloud latency.
Text, Rectangle, Barcode, and Pose Detection Examples
Text Recognition (OCR): Detects printed/handwritten text. Use VNRecognizeTextRequest; extract topCandidate.string and confidence from VNRecognizedTextObservation.
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for observation in observations {
if let topCandidate = observation.topCandidates(1).first {
print("Detected text: \(topCandidate.string)")
}
}
}
Rectangle Detection: For document scanning. VNDetectRectanglesRequest yields VNRectangleObservation.boundingBox.
Barcode Detection: Supports QR, UPC, EAN. VNDetectBarcodesRequest provides VNBarcodeObservation.payloadStringValue.
Human Body Pose: Tracks 14+ joints for fitness/AR. VNDetectHumanBodyPoseRequest returns VNHumanBodyPoseObservation; query points like .leftWrist.location via recognizedPoint(.leftWrist).
Each prints results (e.g., coordinates, strings); integrate into SwiftUI/UIKit for overlays.
Visualize Detection Confidence with Swift Charts
Pair Vision outputs with SwiftUI Charts for dashboards. Model data:
struct TextConfidence: Identifiable {
let id = UUID()
let text: String
let confidence: Double
}
Chart view:
import Charts
struct ConfidenceChart: View {
var data: [TextConfidence]
var body: some View {
Chart(data) {
BarMark(x: .value("Text", $0.text), y: .value("Confidence", $0.confidence))
}.frame(height: 300).padding()
}
}
Populate from OCR: append TextConfidence(text: candidate.string, confidence: candidate.confidence) for each observation. This turns raw detections into actionable, visual insights without external libraries.
Trade-offs Favor Vision Over Cloud APIs
Vision excels in speed (no network), privacy (on-device), and cost (free) versus OpenAI/Google Cloud Vision. Drawbacks: limited to Apple's models (extend via Core ML); iOS/iPadOS only. Ideal for production apps needing reliability—no rate limits or tokens. GitHub repo at https://github.com/sanjaynela/visionApiProject.git expands these into a full project.