Efficient query processing on large graphs
Date of Issue2014
School of Computer Engineering
Centre for Advanced Information Systems
Graphs provide a natural way to represent real-world objects and their relationships, and hence they are utilized to model concepts and data in various domains, such as social networks, the Web, biological informatics and spatial databases. In the big data era, most graphs are extremely large and growing rapidly in size, which brings challenges to processing graph queries e ciently. In this thesis, we investigate three important problems in graph query processing, namely, the point-to-point shortest path and distance queries on road networks, the single source shortest path and distance queries on diskresident graphs, and the point-to-point reachability queries on dynamic graphs. These problems are not only interesting in themselves when the graph size becomes large, but also fundamental building blocks of numerous real-world applications. In particular, given two locations s and t in a road network, a point-to-point distance query returns the minimum network distance from s to t, while a point-to-point shortest path query computes the actual route that achieves the minimum distance. To solve this problem, we propose Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice. On the theoretical side, we show that, based on a realistic characteristic of real-world road networks, AH answers any distance query in constant time. In addition, any shortest path query can be answered with k more steps, where k is the number of vertices on the shortest path. On the practical side, AH outperforms the state-of-the-art methods in terms of query time and its space and pre-computation overheads are moderate. Then, given a vertex s in a graph G, a single source shortest distance query from s asks for the distance from s to every other vertex in G, while a single source shortest path query retrieves the shortest path from s to any other vertex. To handle these two types of queries on disk-resident graphs, we propose Highways-on-disk (HoD), a disk-based index that supports both shortest path and distance queries on directed and weighted graphs. Di erent from existing methods, HoD does not assume the graph ts into memory, does not limit the graph type as undirected graph, and does not restrict that the edges have unit weights or even integer weights. Although being more general and exible, HoD is still able to outperform the state-of-the-art methods in terms of query processing time by up to two orders of magnitude. In addition, HoD has a smaller space consumption and pre-computation time in most cases. Finally, given two vertices s and t in G, a reachability query from s to t asks whether there exists a path from s to t in G. To solve the problem on dynamic graphs, we rst i propose the Total Order Labeling (TOL) framework, which summarizes three most advanced methods [22,51,94] for reachability queries on static graphs. Then we investigate novel algorithms to handle updates of TOL, regarding both insertions and deletions on dynamic graphs. To our best knowledge, this is the rst practically e cient and scalable approach that handles the point-to-point reachability queries on dynamic graphs. Finally we propose a new reachability index Butter y, which o ers reduced preprocessing, space, and query costs than any existing indices under TOL [22,51,94]. We experimentally evaluate TOL using a large variety of benchmark datasets with up to twenty million vertices, and we demonstrate the superiority of TOL against alternative solutions for static and dynamic graphs.
DRNTU::Engineering::Computer science and engineering::Information systems::Database management