AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto*, Tushar Nagarajan, Giuseppe Averta, Dima Damen
;
Abstract
"Egocentric videos provide a unique perspective into individuals’ daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce , a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human’s ability to maintain information from a single watching, focuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new (), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We showcase improved performance of on , surpassing other video QA baselines by a substantial margin."
Related Material
[pdf]
[supplementary material]
[DOI]