I built a real-time AR plane spotter, here's the math that makes it work

I've been building an Android app that identifies aircraft overhead when you point your phone at the sky. The app fetches live ADS-B data and overlays aircraft labels on the camera feed, but getting the math right took much longer than I expected, so I wrote it all up.

The problem sounds simple, you have a GPS coordinate in the sky and a GPS coordinate in your hand. You want a pixel. But there are four distinct coordinate spaces between those two things, and the transitions between them have sign conventions that fail silently, wrong output with no error.

The pipeline:

  Geodetic (lat, lon, alt)
    ↓  flat-earth approx — valid <100 km, error <2 px at 50 nm range
  ENU — East, North, Up (metres)
    ↓  R⊤ from Android TYPE_ROTATION_VECTOR sensor
  Device frame (dX, dY, dZ)
    ↓  one sign flip: Cz = −dZ
  Camera frame (Cx, Cy, Cz)
    ↓  perspective divide + FOV normalisation
  Screen pixels (Xpx, Ypx)
Why each transition is non-obvious:

Geodetic → ENU. The East component has a cosine factor that most implementations miss: E = Δλ × (π·RE/180) × cos(φ_user). Meridians converge toward the poles, one degree of longitude is fewer metres at latitude 25° than at the equator. Without it, East-West positions look correct near the equator and quietly diverge as latitude increases.

ENU → Device frame. Android's rotation matrix R maps device axes to ENU world axes. To go the other direction you use R⊤. In Android's row-major FloatArray(9), this means column indices, not row indices:

  R  (forward): dX = R[0]·E + R[1]·N + R[2]·U
  R⊤ (inverse): dX = R[0]·E + R[3]·N + R[6]·U
These produce completely different results. Both compile without complaint.

Device → Camera frame. Android's sensor defines +Zd as pointing out of the screen toward your face. The camera convention requires +Cz to point into the scene. So Cz = −dZ, always. This is the only correction needed for portrait mode.

Camera → Screen. After the perspective divide and FOV normalisation, the Y axis flips: Ypx = (1 − NDCy) × H/2. Camera +Cy is up; screen y=0 is at the top. If we miss this, the aircraft above the horizon appears below screen centre.

Real captured values (ATR72, 18,000 ft):

  User:     24.8600°N, 80.9813°E
  Aircraft: 24.9321°N, 81.0353°E

  ENU:  E=6,010 m  N=8,014 m  U=5,486 m
  Bearing 34.2° (NNE),  Elevation 29.5°,  Range 11.1 km

  Camera frame (after R⊤ + sign fix): (729, 4692, 10077)
  Magnitude: 11,140 m ≈ 11,138 m (ENU range) 

  Screen (1080×1997, θH=66°, θV=50°): (600 px, 1 px)
Phone azimuth 33.0°, aircraft bearing 34.2° → 1.2° right of centre. Phone pitched −4.3°, elevation 29.5° → net 33.8° up, just inside the top edge of the frustum. Physically consistent throughout.

Happy to answer questions about any stage of the pipeline or about anything else, whatever is interesting to anyone.

4 points | by ananddhruv29 15 hours ago

2 comments