V2X-Real
The first large-scale, real-world multi-modal dataset
for Vehicle-to-Everything (V2X) perception
What does V2X-Real provide?
V2X-Real is the first large-scale real-world dataset for Vehicle-to-Everything (V2X) cooperative perception.
It is featured with:
- Multiple connected agents with two vehicles and two infrastructures, providing multi-view multi-modal sensor datastream.
- 1.2M annotated 3D bounding boxes of 10 object categories with 33K LiDAR frames and 171K multi-view camera data
- 4 sub-datasets tailored for cooperative perception with different collaboration modes: vehicle-centric (VC), infrastructure-centric (IC), Vehicle-to-Vehicle (V2V) and Infrastructure-to-Infrastructure (I2I).
- Comprehensive benchmarks and open-source codes for multi-class multi-agent V2X cooperative perception.

Data Acquisition
Connected Autonomous Vehicle Hardware
LiDAR:
- Robosense 128 channel, 10 Hz
- 200m capturing range
- −25◦ to 15◦ vertical FOV , ±3 cm error
Camera:
- 4x Stereo RGBD camera, 10 Hz
- 1920 × 1080 resolution
- 120◦ FOV
GNSS:
- Gongji GPS
- High accurate IMU


Smart Infrastructure Hardware
LiDAR:
- Ouster OS1-128/64 channel, 10 Hz
- < 40m capturing range
- −16.6◦ to +16.6◦(64), −22.5◦ to +22.5◦(128) FOV
Camera:
- 2x Axis-P14555 camera, 30 Hz
- 1920 × 1080 resolution
- 76◦ horizontal FOV, 48◦ vertical FOV
GNSS:
- Garmin GPS
Data Annotation
Sensor Calibration
- CAV calibration: four Cam-LiDAR calibration pairs for each vehicle.
- Smart infrastructure calibration: two Cam-LiDAR calibration pairs for each infrastructure.

3D Bounding Boxes Annotation:
- SusTechPoint is used to annotate 3D bounding boxes for LiDAR data.
- Four groups of professional annotators label and refine 10 object classes.
- Each object is annotated with its 7-degree-of-freedom 3D bounding box, driving state, and ID and size.
- Consistent ID are assigned to the same object in different views.


Data Desensitization:
- We desensitized over 68K images frames
- For vehicles license plate, we use blur level = 25 to block out all information.
- For human face, we use blur level = 10 to mask out the critical information while maintaining detection photorealism.
Benchmark
VC Cooperative Perception Benchmark
Method | Car AP @IoU0.3 | Car AP @IoU0.5 | Ped AP @IoU0.3 | Ped AP @IoU0.5 |
Truck AP @IoU0.3 |
Truck AP @IoU0.5 |
mAP @IoU0.3 | mAP @IoU0.5 |
No Fusion | 38.7 | 35.9 | 25.5 | 13.1 | 20.2 | 14.5 | 28.2 | 21.2 |
Late Fusion | 46.1 | 43.1 | 28.3 | 11.8 | 19.8 | 14.0 | 31.4 | 23.0 |
Early Fusion | 51.1 | 47.6 | 31.6 | 16.0 | 32.5 | 23.6 | 38.4 | 29.1 |
F-Cooper | 57.3 | 54.2 | 30.0 | 14.1 | 27.0 | 21.2 | 38.1 | 29.8 |
AttFuse | 62.6 | 59.4 | 32.2 | 15.5 | 32.6 | 26.6 | 42.5 | 33.8 |
V2X-ViT | 62.7 | 60.3 | 36.7 | 18.6 | 35.1 | 28.3 | 44.8 | 35.8 |
IC Cooperative Perception Benchmark
Method | Car AP @IoU0.3 | Car AP @IoU0.5 | Ped AP @IoU0.3 | Ped AP @IoU0.5 | Truck AP @IoU0.3 |
Truck AP @IoU0.5 |
mAP @IoU0.3 | mAP @IoU0.5 |
No Fusion | 43.3 | 35.6 | 25.6 | 12.7 | 21.2 | 20.2 | 30.0 | 22.8 |
Late Fusion | 65.2 | 60.4 | 33.4 | 16.0 | 30.8 | 24.5 | 43.1 | 33.6 |
Early Fusion | 64.2 | 59.8 | 32.3 | 15.0 | 33.3 | 27.4 | 43.3 | 34.1 |
F-Cooper | 49.7 | 43.0 | 31.0 | 15.3 | 31.9 | 21.1 | 37.5 | 26.5 |
AttFuse | 70.4 | 67.3 | 37.5 | 17.0 | 42.9 | 31.3 | 50.3 | 38.5 |
V2X-ViT | 64.2 | 56.5 | 37.8 | 19.2 | 35.6 | 28.8 | 45.9 | 34.8 |
V2V Cooperative Perception Benchmark
Method | Car AP @IoU0.3 | Car AP @IoU0.5 | Ped AP @IoU0.3 | Ped AP @IoU0.5 | Truck AP @IoU0.3 |
Truck AP @IoU0.5 |
mAP @IoU0.3 | mAP @IoU0.5 |
No Fusion | 41.7 | 39.4 | 26.9 | 14.4 | 21.6 | 13.7 | 30.1 | 22.5 |
Late Fusion | 47.4 | 44.4 | 29.2 | 14.9 | 18.7 | 9.1 | 31.8 | 22.8 |
Early Fusion | 54.0 | 49.8 | 31.9 | 17.1 | 28.6 | 18.6 | 38.1 | 28.5 |
F-Cooper | 42.7 | 40.3 | 27.7 | 14.0 | 25.6 | 18.6 | 32.0 | 24.3 |
AttFuse | 58.6 | 55.3 | 30.1 | 15.4 | 28.9 | 21.7 | 39.2 | 30.8 |
V2X-ViT | 59.0 | 56.3 | 37.4 | 20.7 | 42.9 | 35.0 | 46.5 | 37.3 |
I2I Cooperative Perception Benchmark
Method | Car AP @IoU0.3 | Car AP @IoU0.5 | Ped AP @IoU0.3 | Ped AP @IoU0.5 | Truck AP @IoU0.3 |
Truck AP @IoU0.5 |
mAP @IoU0.3 | mAP @IoU0.5 |
No Fusion | 48.6 | 40.0 | 30.9 | 15.8 | 23.5 | 22.4 | 34.3 | 26.1 |
Late Fusion | 67.2 | 63.3 | 41.1 | 23.1 | 48.4 | 39.1 | 52.2 | 41.8 |
Early Fusion | 60.9 | 57.2 | 41.2 | 21.9 | 38.5 | 30.8 | 46.9 | 36.6 |
F-Cooper | 71.5 | 65.5 | 49.3 | 27.5 | 50.0 | 40.7 | 56.9 | 44.6 |
AttFuse | 73.5 | 69.5 | 42.8 | 20.5 | 53.2 | 39.4 | 56.5 | 43.1 |
V2X-ViT | 77.3 | 68.8 | 54.4 | 30.5 | 56.2 | 51.5 | 62.7 | 50.3 |
Paper + Github
Download
Note: All our V2X-Real dataset are in OPV2V format, please check [here] for data structure details.
OPV2V format (Example)
split1 [link]
OPV2V format (V2X-Real-Lidar-64)
OPV2V format (V2X-Real-Lidar-128)
OPV2V format (V2X-Real-Lidar-Cameras)
BibTeX
@article{xiang2024v2x, title={V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception}, author={Xiang, Hao and Zheng, Zhaoliang and Xia, Xin and Xu, Runsheng and Gao, Letian and Zhou, Zewei and Han, Xu and Ji, Xinkai and Li, Mingxi and Meng, Zonglin and others}, journal={arXiv preprint arXiv:2403.16034}, year={2024} }
Copyright © 2023 UCLA Mobility Lab
All Rights Reserved
Contact Us: jiaqima@ucla.edu
All Rights Reserved
Contact Us: jiaqima@ucla.edu