arXiv:2409.18300v2 Announce Type: replace-cross
Abstract: We introduce FALCON, a unified self-supervised video pretraining approach for UAV action recognition from raw RGB aerial footage, requiring no additional preprocessing at inference. UAV videos exhibit severe spatial imbalance: large, cluttered backgrounds dominate the field of view, causing reconstruction-based pretraining to waste capacity on uninformative regions and under-learn action-relevant human/object cues. FALCON addresses this by integrating object-aware masked autoencoding with object-centric dual-horizon future reconstruction. Using detections only during pretraining, we construct objectness priors that (i) enforce balanced token visibility during masking and (ii) concentrate reconstruction supervision on action-relevant regions, preventing learning from being dominated by background appearance. To promote temporal dynamics learning, we further reconstruct short- and long-horizon future content within an object-centric supervision region, injecting anticipatory temporal supervision that is robust to noisy aerial context. Across UAV benchmarks, FALCON improves top-1 accuracy by 2.9% on NEC-Drone and 5.8% on UAV-Human with a ViT-B backbone, while achieving 2$times$–5$times$ faster inference than supervised approaches that rely on heavy test-time augmentation.
Generative AI-assisted Participatory Modeling in Socio-Environmental Planning under Deep Uncertainty
arXiv:2603.17021v1 Announce Type: new Abstract: Socio-environmental planning under deep uncertainty requires researchers to identify and conceptualize problems before exploring policies and deploying plans. In practice



