While humans can identify physically implausible events within milliseconds, machine learning approaches addressing the same problem are extremely slow and expensive. They either rely on external multimodal-LLM judges or require ad-hoc modifications to the training procedure. In this work, we argue that indicators of physical plausibility are implicitly captured by five geometric properties of the per-frame embeddings produced by frozen image encoders. In aggregate, we call them GeoPhys.
TL;DR: Physical plausibility is a geometric property of feature trajectories through frozen image encoders. No video pretraining, no physics supervision, no learned ranker. GeoPhys set new detection SoTA and steer video generation as a cheap verifier.
A frozen backbone maps each frame to a pooled feature; stacking them across time gives a trajectory. Plausible videos trace smooth, locally linear paths. Violations bend.
The five signals. Each measures one way a trajectory can betray a physics violation, computed directly on the per-frame features with no learned parameters.
How far the feature point travels between consecutive frames. Steady motion keeps these steps even; an object that teleports, vanishes, or stutters makes the step sizes spike.
How sharply the path turns at each step. Good representations keep natural motion locally straight; a wall-pass or sudden reversal bends the trajectory.
Whether those turns are uniform or erratic. A consistent path bends by similar amounts each step; violations scatter the turning angles.
The change in the step vector itself, the second difference. It catches abrupt shifts in speed or direction that smooth dynamics avoid.
How far the next point strays from a linear predictor fit on the recent past. A violation breaks the local low-dimensional structure and the residual jumps.
On the trajectory , with step , turning angle , acceleration , and residual of a linear predictor on . All statistics run over ; larger values mean less regular dynamics.
Four frozen backbones. The score runs on self-supervised transformers and explicit models of primate visual cortex. None is trained on video or physics. For each backbone we pick the readout layer that maximises the plausible-versus-violated curvature gap on a held-out split. Backbones are complementary: each specialises in a different signal and wins a different physics domain.
A useful signal should reflect a real phenomenon, not an artefact of pretraining statistics. We compare per-frame GeoPhys signals against human EEG on matched violation-of-expectation stimuli.
On two object-permanence scenarios, Create (one object enters an occluder, two emerge) and Vanish (two enter, one emerges), CORnet-S IT speed follows the contralateral delay activity, an EEG marker of visual working memory. Both rise and stay elevated after the violation, and the GeoPhys signal scales with the number of tracked objects.
GeoPhys signals align with human EEG responses to object-permanence violations and scale with object number, consistent with physical-violation perception.
drop static/images/ds_likephys.jpg
drop static/images/ds_intphys2.jpg
drop static/images/ds_physicsiq.jpg
| Verifier | LikePhys | IntPhys2 |
|---|---|---|
| Best video diffusion model (Hunyuan) | 56.4 | – |
| GPT-4o | – | 53.8 |
| Gemini 2.5 Flash | – | 55.6 |
| V-JEPA 2 (1.1B, video-pretrained) | – | 57.5 |
| DINOv2 (L12) | 78.6 | 58.8 |
| DINOv3 (L18) | 80.8 | 60.5 |
| CORnet-S (IT / V1) | 78.2 | 61.1 |
| VOneNet (V1) | 77.6 | 61.7 |
| GeoPhys · Majority vote | 90.9 | 77.5 |
| GeoPhys · OR ensemble | 98.3 | 93.3 |
| Human | – | 96.4 |
A leave-one-scene-out linear probe on the same DINOv3 features reaches only 62.4% on LikePhys, so the gain comes from trajectory geometry, not the backbone alone.
GeoPhys unlocks physics-plausibility detection: every backbone surpasses all state-of-the-art baselines on both benchmarks (LikePhys 98% and IntPhys2 93%).
Beyond passive measurement, GeoPhys transfers to active control. As a best-of-N verifier it reranks a generator's candidates by plausibility, with no learned ranker and no PhysicsIQ-specific tuning.
On PhysicsIQ GeoPhys outscores every inference-time verifier and closes more of the gap to the oracle ceiling than a billion-parameter world model.
PhysicsIQ score (%, higher is better). I2V settings score lower than the V2V setting; GeoPhys leads the real verifiers in every column.
Test-time scaling. More candidates, better physics. As the best-of-N budget grows, GeoPhys keeps climbing toward the oracle ceiling while cheaper verifiers and the no-verifier baseline plateau, so spending compute at inference pays off only with the right plausibility signal.
Results in motion. Pick a physics family, then drag the divider to compare the no-verifier baseline against the GeoPhys best-of-N selection.
Same scenario, same candidate pool. The baseline takes a random sample; GeoPhys reranks by plausibility. Δ is the per-scenario PhysicsIQ gain over baseline.
Cheaper Frozen image encoders cost a fraction of a video-pretrained world model.
1.5× lower wall-clock and 4.65× lower memory than the world-model verifier. At a fixed budget that is roughly 5× more candidates.
Test-time compute scales physically plausible generation, but only with the right verifier. GeoPhys closes the gap to the oracle faster than world-model and other verifiers, at 4.65× less compute, using only frozen image encoders.
We do not claim frozen backbones represent or simulate physics. GeoPhys is a correlate of plausibility, not an implementation. The signals are time-symmetric, and the best-of-N result is bounded by candidate diversity. We see GeoPhys as a diagnostic for current generators, not a substitute for a genuine world model.
If you find this work useful, please cite it.
@misc{interno2026geophys,
title = {GeoPhys: The Geometry of Physical Plausibility},
author = {Intern\`{o}, Christian and Pondaven, Alexander and Issa, Habon
and Pizzati, Fabio and Pinto, Francesco and Olhofer, Markus
and Laptev, Ivan and Torr, Philip and Simoncelli, Eero P.
and Hammer, Barbara and Klindt, David},
year = {2026},
url = {https://christianinterno.github.io/GeoPhys/}
}