Refº

Pioneer Alliance

Nº de postos de trabalho

1

Edital (EN)

Generative artificial intelligence (AI) offers many promises, while raising questions about its usefulness and performance in its many applications. This thesis focuses on the contribution of generative AI to road safety, with a particular emphasis on the safety of electric two-wheelers. In France, the cycling population continues to grow, accompanied by a significant increase in trips (+13% in 2022 compared to 2021 and +41% compared to 2018). However, this growth is accompanied by a rise in the number of cyclists killed (+31% between 2022 and 2019). Electrically-assisted bicycles (EABs) are increasingly concerned, with a 72% increase in fatalities linked to this category between 2019 (25 fatalities) and 2022 (43 fatalities). With regard to mopeds, the increase in the number of motorized two-wheelers is leading to a rise in injuries among users, particularly among young people (aged 14-17), who do not protect themselves sufficiently against the risks inherent in driving this type of vehicle (ONISR, 2020). Despite this increase in the number of users, the body of scientific data on these practices and this population remains limited. A review of the literature reveals a lack of in-depth knowledge about their behaviors and interactions with other users in real-life situations, particularly vulnerable users. The main aim of this thesis is to develop a methodology for transport researchers to study the behavior of electric two-wheeler (VAE and electric scooter) riders in high-risk situations. These situations often result from inappropriate actions, influenced by various factors: driving experience, emotional state (fatigue, frustration), vehicle dynamics, road infrastructure and environmental conditions (weather, road conditions, traffic density). Detecting these high-risk situations is a major challenge, particularly for two-wheelers, given the instability of the vehicle and the fragility of the driver, requiring constant anticipation of changes in the driving environment. This thesis follows on from other work focusing on the detection of critical events in two-wheeler driving. Previous theses, such as those by Attal (2015) and Diop (2022), have explored approaches based on unsupervised machine learning and anomaly detection to identify critical events from data on vehicle dynamics (accelerometers, gyrometers) and context (GPS). As part of the 2RLS project, funded by the DSR, a study has been carried out to improve knowledge of users of self-service electric bikes and scooters, their practices and the associated risks. An experiment carried out in Paris with 19 participants over a 10 km course collected heterogeneous data: 360° videos, audio recordings, accelerometric and gyroscopic measurements, and GPS data. The main objective of this thesis is to exploit this mass of data by drawing on recent advances in artificial intelligence, particularly generative AI, to detect critical events during the riding of electric two-wheelers. A methodology integrating language and vision models (LLM and VLM) will be developed to analyze the driving environment captured by 360° cameras. This methodology will combine advanced image and video processing techniques, such as a video token sparsification method, to optimize the processing of the large quantities of data generated. VLM models will enable accurate interpretation of visual scenes and object recognition, while LLMs will analyze context and understand the complex interactions between different elements of the road environment. The information extracted by VLMs, coupled with vehicle dynamic data, will enable us to assess the risks associated with interactions between road users. For example, heavy traffic and wet pavement represent an increased risk of sudden braking or accident. The presence of an obstacle on the road increases the risk of collision, depending on traffic density. Références: Radford et al., 2021. CLIP: Connecting Vision and Language via Contrastive Learning – Propose une méthode d’apprentissage qui relie des images à des descriptions textuelles, permettant de reconnaître des objets et des scènes complexes. Li, J., Li, D., Xiong, C., & Hoi, S. (2022, June). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888-12900). PMLR. Dosovitskiy et al., 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale – Utilisation de Transformers pour interpréter les scènes visuelles complexes, applicable à l'analyse de scènes routières. Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. L. H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., ... & Gao, J. (2022). Grounded languageimage pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10965-10975). Zhang, J., Huang, J., Jin, S., & Lu, S. (2024). Vision-language models for vision tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Requisitos Gerais

Requisitos Especiais

PIONEER (Paris + 12M em Lisboa) Concurso de bolsa de doutoramento