V2X-Real

The first large-scale, real-world multi-modal dataset

for Vehicle-to-Everything (V2X) perception

What does V2X-Real provide?

V2X-Real is the first large-scale real-world dataset for Vehicle-to-Everything (V2X) cooperative perception.

It is featured with:

Multiple connected agents with two vehicles and two infrastructures, providing multi-view multi-modal sensor datastream.
1.2M annotated 3D bounding boxes of 10 object categories with 33K LiDAR frames and 171K multi-view camera data
4 sub-datasets tailored for cooperative perception with different collaboration modes: vehicle-centric (VC), infrastructure-centric (IC), Vehicle-to-Vehicle (V2V) and Infrastructure-to-Infrastructure (I2I).
Comprehensive benchmarks and open-source codes for multi-class multi-agent V2X cooperative perception.

Data Acquisition

Connected Autonomous Vehicle Hardware

LiDAR:

Robosense 128 channel, 10 Hz
200m capturing range
−25◦ to 15◦ vertical FOV , ±3 cm error

Camera:

4x Stereo RGBD camera, 10 Hz
1920 × 1080 resolution
120◦ FOV

GNSS:

Gongji GPS
High accurate IMU

Smart Infrastructure Hardware

LiDAR:

Ouster OS1-128/64 channel, 10 Hz
< 40m capturing range
−16.6◦ to +16.6◦(64), −22.5◦ to +22.5◦(128) FOV

Camera:

2x Axis-P14555 camera, 30 Hz
1920 × 1080 resolution
76◦ horizontal FOV, 48◦ vertical FOV

GNSS:

Garmin GPS

Data Annotation

Sensor Calibration

CAV calibration: four Cam-LiDAR calibration pairs for each vehicle.
Smart infrastructure calibration: two Cam-LiDAR calibration pairs for each infrastructure.

3D Bounding Boxes Annotation:

SusTechPoint is used to annotate 3D bounding boxes for LiDAR data.
Four groups of professional annotators label and refine 10 object classes.
Each object is annotated with its 7-degree-of-freedom 3D bounding box, driving state, and ID and size.
Consistent ID are assigned to the same object in different views.

Data Desensitization:

We desensitized over 68K images frames
For vehicles license plate, we use blur level = 25 to block out all information.
For human face, we use blur level = 10 to mask out the critical information while maintaining detection photorealism.

Benchmark

VC Cooperative Perception Benchmark

Method	Car AP @IoU0.3	Car AP @IoU0.5	Ped AP @IoU0.3	Ped AP @IoU0.5	Truck AP @IoU0.3	Truck AP @IoU0.5	mAP @IoU0.3	mAP @IoU0.5
No Fusion	38.7	35.9	25.5	13.1	20.2	14.5	28.2	21.2
Late Fusion	46.1	43.1	28.3	11.8	19.8	14.0	31.4	23.0
Early Fusion	51.1	47.6	31.6	16.0	32.5	23.6	38.4	29.1
F-Cooper	57.3	54.2	30.0	14.1	27.0	21.2	38.1	29.8
AttFuse	62.6	59.4	32.2	15.5	32.6	26.6	42.5	33.8
V2X-ViT	62.7	60.3	36.7	18.6	35.1	28.3	44.8	35.8

IC Cooperative Perception Benchmark

Method	Car AP @IoU0.3	Car AP @IoU0.5	Ped AP @IoU0.3	Ped AP @IoU0.5	Truck AP @IoU0.3	Truck AP @IoU0.5	mAP @IoU0.3	mAP @IoU0.5
No Fusion	43.3	35.6	25.6	12.7	21.2	20.2	30.0	22.8
Late Fusion	65.2	60.4	33.4	16.0	30.8	24.5	43.1	33.6
Early Fusion	64.2	59.8	32.3	15.0	33.3	27.4	43.3	34.1
F-Cooper	49.7	43.0	31.0	15.3	31.9	21.1	37.5	26.5
AttFuse	70.4	67.3	37.5	17.0	42.9	31.3	50.3	38.5
V2X-ViT	64.2	56.5	37.8	19.2	35.6	28.8	45.9	34.8

V2V Cooperative Perception Benchmark

Method	Car AP @IoU0.3	Car AP @IoU0.5	Ped AP @IoU0.3	Ped AP @IoU0.5	Truck AP @IoU0.3	Truck AP @IoU0.5	mAP @IoU0.3	mAP @IoU0.5
No Fusion	41.7	39.4	26.9	14.4	21.6	13.7	30.1	22.5
Late Fusion	47.4	44.4	29.2	14.9	18.7	9.1	31.8	22.8
Early Fusion	54.0	49.8	31.9	17.1	28.6	18.6	38.1	28.5
F-Cooper	42.7	40.3	27.7	14.0	25.6	18.6	32.0	24.3
AttFuse	58.6	55.3	30.1	15.4	28.9	21.7	39.2	30.8
V2X-ViT	59.0	56.3	37.4	20.7	42.9	35.0	46.5	37.3

I2I Cooperative Perception Benchmark

Method	Car AP @IoU0.3	Car AP @IoU0.5	Ped AP @IoU0.3	Ped AP @IoU0.5	Truck AP @IoU0.3	Truck AP @IoU0.5	mAP @IoU0.3	mAP @IoU0.5
No Fusion	48.6	40.0	30.9	15.8	23.5	22.4	34.3	26.1
Late Fusion	67.2	63.3	41.1	23.1	48.4	39.1	52.2	41.8
Early Fusion	60.9	57.2	41.2	21.9	38.5	30.8	46.9	36.6
F-Cooper	71.5	65.5	49.3	27.5	50.0	40.7	56.9	44.6
AttFuse	73.5	69.5	42.8	20.5	53.2	39.4	56.5	43.1
V2X-ViT	77.3	68.8	54.4	30.5	56.2	51.5	62.7	50.3

Paper + Github

Paper: https://arxiv.org/abs/2403.16034

Github: https://github.com/ucla-mobility/V2X-Real

Download

Note: All our V2X-Real dataset are in OPV2V format, please check [here] for data structure details.

OPV2V format (Example)

split1 [link]

OPV2V format (V2X-Real-Lidar-64)

Note: In this version, the vehicle-side LiDAR has 64 lines, while the infrastructure-side LiDAR remains unchanged with 128 and 64 lines.
Test_set [link] | Validation_set [link] | Training_set [link1][link2][lin3][link4]

OPV2V format (V2X-Real-Lidar-128)

Note: In this version, the vehicle-side LiDAR has 128 lines, infrastructure-side LiDAR has 128 and 64 lines.

Test_set [link] | Validation_set [link] | Training_set[link1] [lin2] [lin3] [lin4]

OPV2V format (V2X-Real-Lidar-Cameras)

Note: In this version, all the LiDAR-128 and cameras data will be included. This is our most completed version.
Test_set[link] | Validation_set[link] | Training_set[link1][link2][link3][link4]

BibTeX

@article{xiang2024v2x,
  title={V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception},
  author={Xiang, Hao and Zheng, Zhaoliang and Xia, Xin and Xu, Runsheng and Gao, Letian and Zhou, Zewei and Han, Xu and Ji, Xinkai and Li, Mingxi and Meng, Zonglin and others},
  journal={arXiv preprint arXiv:2403.16034},
  year={2024}
}