Abstract
Autonomous Vehicles rely on various sensors to perceive the environment. The precise and reliable perception guarantees the safety and performance of autonomous vehicles. This thesis focuses on the advanced perception capabilities of autonomous vehicles. It is based on four research articles dedicated to sensor hardware, dataset collection, sensor fusion, and AI-based scene interpretation. The research begins with analyzing multiple range sensor deployment for autonomous shuttles. The analysis is based on the real-traffic-deployed iseAuto shuttle operating on the TalTech campus. Considering the appearance of shuttle buses and the LiDAR sensor characteristics such as full horizontal but limited vertical views, the sensor models and installation location choices are critical for autonomous shuttles to ensure the least sensor interference and cover the most blind zones. The thesis then presents an end-to-end generic dataset collection framework that includes hardware deployment, multi-sensor calibration and synchronization solutions, dataset transferring and sharing protocols, and signal-level sensor fusion algorithms. The framework generalizes the implementation of the multi-modal perceptive system on var- ious robotics and autonomous platforms. The camera, LiDAR, radar, and GNSS sensors were included in the framework. The merits of all sensors are fused in a manner useful for object detection and tracking. The dataset collection framework was deployed on different autonomous platforms. The initial validation was carried out on a car roof rack with all integrated sensors. The val- idation tests cover various transportation scenes such as highway, urban, and neighbor- hood. The practical implementation of the framework is on the iseAuto shuttle. Relying on the tools and algorithms proposed in the framework, the iseAuto dataset contains camera and LiDAR data produced for object detection and segmentation tasks. The dataset features the fierce weather and illumination conditions in Estonia. The iseAuto dataset was used by a fully convolutional neural network (FCN) for deep learning experiments. The experiment results prove two things: i) with the help of camera-LiDAR fusion, it is possible to achieve robust multi-class segmentation on a dataset with only a few annotations; ii) the proposed FCN-based network performs reasonably in poor weather and illumination scenarios. The thesis concludes by proposing a novel vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation. The network invokes the progressive-assemble strategy on a double-direction network to process the camera and LiDAR data in parallel. Moreover, the network is the first transformer-based proposal that uses the strategy to project LiDAR point clouds as camera-plane maps for semantic segmentation. The evaluation experiments report robust performance in all scenarios and prove the significance of combining attention-mechanism and multi-sensor fusion. In summary, this thesis constitutes a comprehensive research journey through all as- pects of deep-learning-based AV perception, from sensor deployment to multi-modal per- ceptive system, then to real-world dataset collection, and last to deep model training for scene interpretation. This research facilitates advanced perception capabilities for a safe and reliable autonomous transportation system.
Publication
TalTech Press