Reusable Pattern for All Vision Requests
Apple's Vision framework processes images offline using VNImageRequestHandler and specific request types. Start with a UIImage, convert to CGImage, then perform requests asynchronously:
func processImage(from image: UIImage, with request: VNRequest) {
guard let cgImage = image.cgImage else { return }
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try? handler.perform([request])
}
Handle results in the request's completion block by casting to the observation type (e.g., VNRecognizedTextObservation). This pattern supports live camera feeds for real-time detection, avoiding cloud latency.
Text, Rectangle, Barcode, and Pose Detection Examples
Text Recognition (OCR): Detects printed/handwritten text. Use VNRecognizeTextRequest; extract topCandidate.string and confidence from VNRecognizedTextObservation.
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for observation in observations {
if let topCandidate = observation.topCandidates(1).first {
print("Detected text: \(topCandidate.string)")
}
}
}
Rectangle Detection: For document scanning. VNDetectRectanglesRequest yields VNRectangleObservation.boundingBox.
Barcode Detection: Supports QR, UPC, EAN. VNDetectBarcodesRequest provides VNBarcodeObservation.payloadStringValue.
Human Body Pose: Tracks 14+ joints for fitness/AR. VNDetectHumanBodyPoseRequest returns VNHumanBodyPoseObservation; query points like .leftWrist.location via recognizedPoint(.leftWrist).
Each prints results (e.g., coordinates, strings); integrate into SwiftUI/UIKit for overlays.
Visualize Detection Confidence with Swift Charts
Pair Vision outputs with SwiftUI Charts for dashboards. Model data:
struct TextConfidence: Identifiable {
let id = UUID()
let text: String
let confidence: Double
}
Chart view:
import Charts
struct ConfidenceChart: View {
var data: [TextConfidence]
var body: some View {
Chart(data) {
BarMark(x: .value("Text", $0.text), y: .value("Confidence", $0.confidence))
}.frame(height: 300).padding()
}
}
Populate from OCR: append TextConfidence(text: candidate.string, confidence: candidate.confidence) for each observation. This turns raw detections into actionable, visual insights without external libraries.
Trade-offs Favor Vision Over Cloud APIs
Vision excels in speed (no network), privacy (on-device), and cost (free) versus OpenAI/Google Cloud Vision. Drawbacks: limited to Apple's models (extend via Core ML); iOS/iPadOS only. Ideal for production apps needing reliability—no rate limits or tokens. GitHub repo at https://github.com/sanjaynela/visionApiProject.git expands these into a full project.