Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183986
Title: | Dynamic query routing for distributed edge chatbots on NVIDIA Jetson platforms | Authors: | Ng, Mu Rong | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Ng, M. R. (2025). Dynamic query routing for distributed edge chatbots on NVIDIA Jetson platforms. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183986 | Abstract: | In the current age of cloud-based computing and the widespread adoption of large language models (LLMs), concerns regarding data privacy, security and efficiency have become largely prominent. Many widely used LLM solutions rely on centralized cloud infrastructures, which is particularly concerning for industries handling sensitive information, such as healthcare, finance, and government services, where strict data protection regulations apply. Deploying LLMs on local edge devices offers a compelling alternative, providing greater control over data while reducing reliance on external cloud services. However, running LLMs on edge devices presents challenges in terms of computational efficiency, power consumption, and latency. A potential solution lies in distributed LLM systems, where multiple edge devices work together to process queries based on their complexity, optimizing performance while maintaining security. This project explores a distributed LLM-based chatbot deployed across multiple Nvidia Jetson devices. The system is designed to dynamically route queries based on their complexity, ensuring that high-performance devices handle resource-intensive tasks while lightweight devices manage simpler queries. The goal is to strike a balance between latency and energy efficiency while maintaining local inference capabilities that prioritise privacy and security. The system utilizes the Nvidia Jetson AGX Orin for high-performance tasks and the Jetson Nano Developer Kit for simpler tasks, operating in a client-server model where each device acts as an independent server. The queries are assigned to the respective devices for processing based on a complexity threshold. Experiments confirmed expected theoretical outcomes—lower latency but higher power usage on the Orin, and the opposite on the Nano. However, further investigation and testing was conducted to determine the optimal threshold point for a balance in efficiency and performance for different use cases. | URI: | https://hdl.handle.net/10356/183986 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ng Mu Rong_FYP Report_Final.pdf Restricted Access | 4.42 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.