Please use this identifier to cite or link to this item:
|Title:||Querying and mining complex graphs : uncertainty and multi-relations||Authors:||Ke, Xiangyu||Keywords:||Engineering::Computer science and engineering::Computer applications
Engineering::Computer science and engineering::Data
|Issue Date:||2019||Publisher:||Nanyang Technological University||Source:||Ke, X. (2019). Querying and mining complex graphs : uncertainty and multi-relations. Doctoral thesis, Nanyang Technological University, Singapore.||Abstract:||Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size.||URI:||https://hdl.handle.net/10356/137863||DOI:||10.32657/10356/137863||Rights:||This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).||Fulltext Permission:||open||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Theses|
Updated on Jan 29, 2023
Updated on Jan 29, 2023
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.