Please use this identifier to cite or link to this item:
Title: Context-aware pedestrian motion prediction
Authors: Haddad, Sirin
Keywords: Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Haddad, S. (2021). Context-aware pedestrian motion prediction. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Pedestrian motion prediction can enhance the effectiveness of Advanced Driver-Assistance Systems (ADAS), autonomous driving, and robotic navigation to maintain pedestrians safety. Pedestrian motion is usually guided by an intention to reach a target place, and pedestrians navigate by making their motion decisions concerning the surrounding space. Numerous approaches captured pedestrian motion by observing their walking trajectory as an essential feature for future motion prediction. However, predicting a pedestrian trajectory in crowded environments is non-trivial due to the uncertainty of pedestrian intentions. This uncertainty is influenced by the pedestrians interaction with static structures and other dynamic objects present in the scene. As such, an accurate and plausible method to predict pedestrian motion in urban environments is still an unsolved problem.The objective of this Ph.D. research is to develop a robust and scalable vision-based framework for predicting pedestrian motion in urban environments. The proposed framework relies on graph-based deep prediction models that learn from pedestrians’ past motion, their surrounding context, and interactions to estimate their future trajectories. The framework models the surrounding environment by taking into account the contextual information consisting of other pedestrians and any fixed subjects present in the navigable area. This includes the social interactions among pedestrians and their interaction with the static settings, which forma dynamic context and play a significant role in determining pedestrian movement.In addition, other pedestrian cues and body features are collected to provide a stronger indication about pedestrian motion and increase the estimation confidence. Finally, in order to achieve high-speed prediction on embedded platforms with tight computational resources, low-complexity methods are considered.The thesis propose four approaches based on the spatio-temporal graphs and deploy Long Short-Term Memory (LSTM) network for predicting pedestrian trajectory in crowded environments. Chapter 3 presents a graph-based modeling approach that considers the pedestrians contextual interaction with the static scene structure and other obstacles (physical objects), and social interaction with dynamic elements(other pedestrians) in the scene. The proposed spatio-temporal graphs capture interactions at several spatial scopes, ranging from locally-spatial contextual interactions to global interactions of all pedestrians. In addition, a spatio-temporal attention mechanism is incorporated to quantify pedestrians mutual influence on each other and apply importance to each interaction. However, this approach yields a large static graph and the underlying graph structure needs to be set a-priori.To improve the scalability of spatio-temporal graphs, Chapter 4 presents SGTV,an adaptive spatio-temporal graph structure, coined as a ”Self-Growing Graph”. A centralized model was considered for encoding the entire graph at a single step and predicting trajectories simultaneously. As such, the contextual and interactions modeling become dynamic and adaptive to the temporal changes in the environment. The dynamic spatio-temporal graph addresses the scalability problem and experiment results demonstrate that SGTV can cater to crowds of up to 70 pedestrians with a running time of 0.75 seconds, while other baselines take 7x longer.Chapter 5 presents G2K, which improves the robustness of the self-growing graph approach by enriching the contextual modeling with more social features from pedestrians. The social cues are encoded simultaneously over time using a multi-dimensional encoder cell. The experiment results show that incorporating the pedestrian head pose into the contextual modeling leads to more accurate predictions across benchmark datasets, while maintaining reasonable scalability under alight weight graph structure.Finally, Chapter 6 presents STR-GGRNN, a scalable and robust pedestrian trajectory framework that integrates the designs in the previous chapters. This method deploys the multi-dimensional encoder with a variational sampling concept to achieve better results. Experiments on widely-used datasets show that the pro-posed framework outperforms the state-of-the-art methods. In particular, it yields a significant reduction in the Average Displacement Error (ADE) and the Final Displacement Error (FDE) of about 12cm and 15cm for the ETH-UCY datasets.For the Stanford Drone Dataset, it achieves 0.05 ADE and 0.07 FDE in meters precision. The proposed STR-GGRNN framework takes only about 2.30 seconds to predict the trajectories of 20 pedestrians.
DOI: 10.32657/10356/151544
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
thesis(6).pdf57.48 MBAdobe PDFView/Open

Page view(s)

Updated on Jan 29, 2023


Updated on Jan 29, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.