Ego4D: Around the World in 3,600 Hours of Egocentric Video
Loading...
Date
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
ORCID
0000-0001-6911-0302
0000-0002-4083-9463
0000-0001-5891-6044
0009-0002-8865-1934
0000-0001-5918-9029
0009-0002-3901-9342
0000-0003-3511-8466
0000-0003-4993-5416
0000-0002-3068-3338
0000-0001-9756-7238
0009-0008-5976-4095
0000-0003-2596-6293
0009-0007-9473-6703
0000-0002-1317-1293
0000-0003-0379-9834
0000-0002-7085-3813
0000-0002-6948-5689
0000-0001-5592-8218
0000-0002-8464-7500
0000-0003-2884-0290
0000-0003-4352-7999
0000-0002-6234-0831
0000-0002-3754-1156
0000-0003-4206-710X
0000-0003-0021-5661
0000-0002-4307-7222
0000-0003-4545-8069
0000-0001-5244-2407
0000-0001-8804-6238
0000-0002-6034-0432
0000-0002-5534-587X
0000-0001-6767-7057
0000-0002-9389-4060
0000-0001-9158-9401
0000-0002-6920-914X
0000-0003-1793-5462
0000-0003-2637-1929
0000-0002-7681-2166
0000-0001-6887-6146
0000-0002-2054-5986
0000-0003-3695-1580 Altmetrics
Abstract
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception.
Description
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2025, vol. 47, issue 11, p. 9468-9509.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611736&utm_source=scopus&getft_integrator=scopus&tag=1
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611736&utm_source=scopus&getft_integrator=scopus&tag=1
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International

