UCLA Smart Intersection

The first real-world multi-agent multimodal Vehicle-to-Everything (V2X)

cooperative perception application for autonomous driving

What does UCLA Smart Intersection provide?

UCLA Smart Intersection is the first real-world multi-agent multi-modality Vehicle-to-Everything (V2X) cooperative perception application in autonomous driving.

It features:

Served as a living lab for intelligent transportation system research.
Enables us to collect data from infrastructure LiDAR, camera, and radar and combine them with sensor data including LiDAR, camera and global navigation satellite system and inertial measurement unit from CAVs to perform cooperative perception research.
To allow us to collect a comprehensive dataset from multi-modality sensors mounted on the multi-agent.
High-definition (HD) map at the smart intersection is provided.
Developed the systematic cooperative perception software platform.

Hardware Configuration

CAV Hardware

LiDAR:

Robosense 128 channel
Frequncy: 10 Hz
200m capturing range
−25◦ to 15◦ vertical FOV , ±3 cm error

Camera:

4x RGB camera with 1920 × 1080 resolution
120◦ FOV
Frequency: 10 Hz

GNSS:

Gongji GPS
With high accurate IMU

Smart Infrastructure Hardware

LiDAR:

Ouster 128 channel + 64 channel
Frequncy: 10 Hz
100m capturing range
−25◦ to 15◦ vertical FOV , ±3 cm error

Camera:

2x RGB camera with 1920 × 1080 resolution/each infra
110◦ FOV
Frequency: 30 Hz

GNSS:

Garmin GPS
For time synchronization

RSU:

Commsignia ITS_RS4-D
For V2X commnication

Sensor Calibration

cIt enables accurate sensor fusion between camera and lidar data. By aligning the two sensor modalities, it allows for more reliable and comprehensive perception of the environment.
Camera intrinsic calibration involves estimating the internal parameters of the camera, such as focal length, principal point, and lens distortion.
Camera-lidar extrinsic calibration focuses on determining the relative pose and transformation between the camera and lidar sensors.
Smart Infrastructure Calibration: For each smart infrastructure, we have two Cam-LiDAR calibration pairs, 4 pairs in total.
CAV Calibration: For each CAV, we have four Cam-LiDAR calibration pairs, 8 in total.

In the left figure, the top image is the calibration cross check from our smart infrastructure camera and LiDAR.

The bottom image is the calibration cross check from the CAV front camera and LiDAR

Data Collection

V2X Data Collection

It has two distinct V2X scenarios to comprehensively capture intersection-related events.
These two scenarios collectively encompass the entirety of intersection traversal scenarios.
Passing Intersection Scenario: we came up with four different routes at the intersection: (1) North-to-South (2) East-to-West (3) South-to-North (4) West-to-East
Turning Scenario: we came up with two different routes for this: (1) CAV keeps turning left at the intersection (2) CAV keeps turning right at the intersection

V2V Data Collection

Within our V2V data collection, we have carefully constructed two distinct scenarios tailored for urban environments.
The first scenario encompasses lane change maneuvers.
The second scenario focuses on car following situations.

Data collection map and arrangement. This figure shows that we have conducted one V2X data colection at our UCLA smart intersection and 3 V2V data collection at three different routes.

Data Annotation

3D Bounding Boxes Annotation:

SusTechPoint is used to annotate 3D bounding boxes for LiDAR data.
Both student annotators and annotators from commercial company participate in labeling and refining five object classes.
Each object is annotated with its 7-degree-of-freedom 3D bounding box, driving state, and consistent ID and size.
Consistent ID and size are assigned to the same object in different timestamps.

Annotation Statistics:

There are 10 labelling classes in our dataset, which includes: vehicle, bus, van, pick-up truck, semi-trucks, pedestrian, scooter rider, bicycle rider, moto rider, trashcan。
We annotated 17326 frames of vehicle-side data and 15366 frames of infra-side data, totaling 32692 frame annotations.

Cooperative Perception

Localization

For cooperative perception, the precise positioning of each agent is crucial for integrating sensor data from all agents, particularly when dealing with moving vehicles.
We developed an INS/GNSS/map-matching-based multi-sensor fusion localization system that ensures a continuous, robust, and accurate provision of pose information.
Multi-sensor fusion localization results in various scenarios
Scenario 1： the GNSS signals are heavily influenced by the driving environments.
Scenario 2： we manually down-sampled the point cloud in the map significantly in this area to reduce the number of features in the environments.
Scenario 3: we choose an experiment segment in which the GNSS and NDT map-matching nodes both are in good status

Online Object Detection + Tracking

Prior to conducting cooperative object detection, a crucial step involves employing an online time synchronization module to achieve temporal alignment between multiple agents, such as CAVs and smart infrastructure.
For the off-the-shelf object detection deep learning detection pipeline, it contains two processes.
First step: the Apollo CNN segmentation algorithm is adopted to segment the point cloud from the multiple agents into clusters belonging to different objects.
Second step:, 3D bounding boxes will be fitted based on the convex hull of the point cloud. The principle of this segmentation algorithm is introduced following.
We explored and implemented several bounding box fusion algorithms: (1) Non-maximum suppression (NMS) (2) Weighted average (3) Object tracking basde fusion

Deep Learning Object Detection

For the cooperative object detection based on customized deep learning algorithm, it contains two processes.
Step 1: Each agent first leverages Pointpillar to predict bounding box proposals separately based on the observed LiDAR point clouds.
Step 2: Afterwards, the predicted bounding boxes are shared with each other within the communication range and then the non-maximum suppression (NMS) is applied to merge the proposals to generate the final detection results.
The whole framework is built upon our open-source OpenCOOD framework wrapped with ROS.
As the default model in OpenCOOD support VLP 32C LiDAR from Velodyne company, we are using LiDAR from Ouster and Robosense in this project. There is a need to retrain the Pointpiller model based on the collected and annotated datasets.