V2X-Real 

 

The first large-scale, real-world multi-modal dataset

for Vehicle-to-Everything (V2X) perception

 

 

 

What does V2X-Real provide?

V2X-Real is the first large-scale real-world dataset for Vehicle-to-Everything (V2X) cooperative perception.

It is featured with:

  • Multiple connected agents with two vehicles and two infrastructures, providing multi-view multi-modal sensor datastream. 
  • 1.2M annotated 3D bounding boxes of 10 object categories with 33K LiDAR frames and 171K multi-view camera data
  • 4 sub-datasets tailored for cooperative perception with different collaboration modes: vehicle-centric (VC), infrastructure-centric (IC), Vehicle-to-Vehicle (V2V) and Infrastructure-to-Infrastructure (I2I).
  • Comprehensive benchmarks and open-source codes for multi-class multi-agent V2X cooperative perception.

Data Acquisition

Connected Autonomous Vehicle Hardware

 

LiDAR:
  • Robosense 128 channel, 10 Hz
  • 200m capturing range
  • −25◦ to 15◦ vertical FOV ,  ±3 cm error
Camera:
  • 4x Stereo RGBD camera, 10 Hz
  • 1920 × 1080 resolution
  • 120◦ FOV
GNSS:
  • Gongji GPS
  • High accurate IMU

Smart Infrastructure Hardware

LiDAR:

  • Ouster OS1-128/64 channel, 10 Hz
  • < 40m capturing range
  • 16.6to +16.6(64), 22.5 to +22.5(128) FOV
Camera:
  • 2x Axis-P14555 camera, 30 Hz
  • 1920 × 1080 resolution
  • 76 horizontal FOV, 48 vertical FOV
GNSS:
  • Garmin GPS 

Data Annotation

Sensor Calibration

  • CAV calibration: four Cam-LiDAR calibration pairs for each vehicle.
  • Smart infrastructure calibration: two Cam-LiDAR calibration pairs for each infrastructure.

    3D Bounding Boxes Annotation:
    • SusTechPoint is used to annotate 3D bounding boxes for LiDAR data.
    • Four groups of professional annotators label and refine 10 object classes.
    • Each object is annotated with its 7-degree-of-freedom 3D bounding box, driving state, and ID and size.
    • Consistent ID are assigned to the same object in different views.
      Data Desensitization:
      • We desensitized over 68K images frames
      • For vehicles license plate, we use blur level = 25 to block out all information.
      • For human face, we use blur level = 10 to mask out the critical information while maintaining detection photorealism. 

      Benchmark

      VC Cooperative Perception Benchmark
      Method Car AP @IoU0.3 Car AP @IoU0.5 Ped AP @IoU0.3 Ped AP @IoU0.5

      Truck AP @IoU0.3

      Truck AP @IoU0.5

      mAP @IoU0.3 mAP @IoU0.5
      No Fusion 38.7 35.9 25.5 13.1 20.2 14.5 28.2 21.2
      Late Fusion 46.1 43.1 28.3 11.8 19.8 14.0 31.4 23.0
      Early Fusion 51.1 47.6 31.6 16.0 32.5 23.6 38.4 29.1
      F-Cooper 57.3 54.2 30.0 14.1 27.0 21.2 38.1 29.8
      AttFuse 62.6 59.4 32.2 15.5 32.6 26.6 42.5 33.8
      V2X-ViT 62.7 60.3 36.7 18.6 35.1 28.3 44.8 35.8
      IC Cooperative Perception Benchmark
      Method Car AP @IoU0.3 Car AP @IoU0.5 Ped AP @IoU0.3 Ped AP @IoU0.5 Truck AP @IoU0.3

      Truck AP @IoU0.5

      mAP @IoU0.3 mAP @IoU0.5
      No Fusion 43.3 35.6 25.6 12.7 21.2 20.2 30.0 22.8
      Late Fusion 65.2 60.4 33.4 16.0 30.8 24.5 43.1 33.6
      Early Fusion 64.2 59.8 32.3 15.0 33.3 27.4 43.3 34.1
      F-Cooper 49.7 43.0 31.0 15.3 31.9 21.1 37.5 26.5
      AttFuse 70.4 67.3 37.5 17.0 42.9 31.3 50.3 38.5
      V2X-ViT 64.2 56.5 37.8 19.2 35.6 28.8 45.9 34.8
      V2V Cooperative Perception Benchmark
      Method Car AP @IoU0.3 Car AP @IoU0.5 Ped AP @IoU0.3 Ped AP @IoU0.5 Truck AP @IoU0.3

      Truck AP @IoU0.5

      mAP @IoU0.3 mAP @IoU0.5
      No Fusion 41.7 39.4 26.9 14.4 21.6 13.7 30.1 22.5
      Late Fusion 47.4 44.4 29.2 14.9 18.7 9.1 31.8 22.8
      Early Fusion 54.0 49.8 31.9 17.1 28.6 18.6 38.1 28.5
      F-Cooper 42.7 40.3 27.7 14.0 25.6 18.6 32.0 24.3
      AttFuse 58.6 55.3 30.1 15.4 28.9 21.7 39.2 30.8
      V2X-ViT 59.0 56.3 37.4 20.7 42.9 35.0 46.5 37.3
      I2I Cooperative Perception Benchmark
      Method Car AP @IoU0.3 Car AP @IoU0.5 Ped AP @IoU0.3 Ped AP @IoU0.5 Truck AP @IoU0.3

      Truck AP @IoU0.5

      mAP @IoU0.3 mAP @IoU0.5
      No Fusion 48.6 40.0 30.9 15.8 23.5 22.4 34.3 26.1
      Late Fusion 67.2 63.3 41.1 23.1 48.4 39.1 52.2 41.8
      Early Fusion 60.9 57.2 41.2 21.9 38.5 30.8 46.9 36.6
      F-Cooper 71.5 65.5 49.3 27.5 50.0 40.7 56.9 44.6
      AttFuse 73.5 69.5 42.8 20.5 53.2 39.4 56.5 43.1
      V2X-ViT 77.3 68.8 54.4 30.5 56.2 51.5 62.7 50.3

      Download

      Note: All our V2X-Real dataset are in OPV2V format, please check [here] for data structure details.  

      OPV2V format (Example)

      split1 [link]   

      OPV2V format (V2X-Real-Lidar-64)

      Note: In this version, the vehicle-side LiDAR has 64 lines, while the infrastructure-side LiDAR remains unchanged with 128 and 64 lines.
      Test_set [link] | Validation_set [link] | Training_set [link1][link2][lin3][link4]

      OPV2V format (V2X-Real-Lidar-128)

      Note: In this version, the vehicle-side LiDAR has 128 lines, infrastructure-side LiDAR has 128 and 64 lines. 

      Test_set [link] | Validation_set [link] | Training_set[link1] [lin2] [lin3] [lin4]

       

      OPV2V format (V2X-Real-Lidar-Cameras)

      Note: In this version, all the LiDAR-128 and cameras data will be included. This is our most completed version. 
      Test_set[link] | Validation_set[link] | Training_set[link1][link2][link3][link4]

      BibTeX

      @article{xiang2024v2x,
        title={V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception},
        author={Xiang, Hao and Zheng, Zhaoliang and Xia, Xin and Xu, Runsheng and Gao, Letian and Zhou, Zewei and Han, Xu and Ji, Xinkai and Li, Mingxi and Meng, Zonglin and others},
        journal={arXiv preprint arXiv:2403.16034},
        year={2024}
      }
      Copyright © 2023 UCLA Mobility Lab
      All Rights Reserved
      Contact Us: jiaqima@ucla.edu