Hi Apple Developer Community,
I'm developing an eye-tracking application using ARKit's ARFaceTrackingConfiguration
and ARFaceAnchor.blendShapes
for gaze detection using Xcode. I'm experiencing several calibration and accuracy issues and would appreciate insights from the community.
Current Implementation
- Using
ARFaceAnchor.blendShapes
(.eyeLookUpLeft
,.eyeLookDownLeft
,.eyeLookInLeft
,.eyeLookOutLeft
, etc.) - Implementing custom sensitivity curves and smoothing algorithms
- Applying baseline correction and coordinate mapping
- Using quadratic regression for calibration point mapping
Issues I'm Facing
1. Calibration Mismatch
- Red dot position doesn't align with where I'm actually looking
- Significant offset between intended gaze point and actual cursor position
- Calibration seems to drift or become inaccurate over time
2. Extreme Eye Movement Requirements
- Need to make exaggerated eye movements to reach screen edges/corners
- Natural eye movements don't translate to proportional cursor movement
- Difficulty reaching certain screen regions even with calibration
3. Sensitivity and Stability Issues
- Cursor jitters or jumps around when looking at center
- Too much sensitivity to micro-movements
- Inconsistent behavior between calibration and normal operation
4. I also noticed that tracking on calibration screen as well as tracking on reading screen works better as expected when head movement is there, but I do not want much head movement. I want tracking with normal eye movement while reading an Ebook.
Primary Question: Word-Level Eye Tracking Feasibility
Is word-level eye tracking (tracking gaze as users read through individual words in an ebook) technically feasible with current iPhone/iPad hardware?
I understand that Apple's built-in eye tracking is primarily an accessibility feature for UI navigation. However, I'm wondering if the TrueDepth camera and ARKit's eye tracking capabilities are sufficient for:
- Tracking natural reading patterns (left-to-right, line-by-line progression)
- Detecting which specific words a user is looking at
- Maintaining accuracy for sustained reading sessions (15-30 minutes)
- Working reliably across different users and lighting conditions
Questions for the Community
-
Hardware Limitations: Are iPhone/iPad TrueDepth cameras capable of the precision needed for word-level tracking, or is this beyond current hardware capabilities?
-
Calibration Best Practices: What calibration strategies have worked best for accurate gaze mapping? How many calibration points are typically needed?
-
Reading-Specific Challenges: Are there particular challenges when tracking reading behavior vs. general gaze tracking?
-
Alternative Approaches: Are there better approaches than ARKit blend shapes for this use case?
Current Setup
- Devices: iPhone 14 Pro
- iOS Version: iOS 18.3
- ARKit Version: Latest available
Any insights, experiences, or technical guidance would be greatly appreciated. I'm particularly interested in hearing from developers who have worked on similar eye tracking applications or have experience with the limitations and capabilities of ARKit's eye tracking features.
Thank you for your time and expertise!