Algorithms for influence maximization and seed minimization
Date of Issue2019-08-01
School of Computer Science and Engineering
Graph is a basic mathematical tool that models information about identities as well as their complex relationships from various real world problems. It has been found important applications in analysis on social networks, route planning, telecommunication, etc. In recent years, the complexity and scale of real world graphs have increased dramatically. In particular, international social networks can comprise of hundreds of millions of users and up to billions of relationships. Thus, even algorithms with decent time or space complexities meet challenges dealing with large-scale networks for the queries like influence maximization and seed minimization on social networks. In this thesis, we investigate two problems on large scale networks for aforementioned applications, i.e., influence maximization and seed minimization. Given G a social network, M a probabilistic propagation model, and a small number k > 0, the influence maximization problem aims to find the largest expectation of the number of influenced nodes that k nodes can trigger under this pre-defined model M. This problem is derived from viral marketing, where a company gives away free samples to a small number of influential individuals in order to create a cascade of adoption via word-of-mouth effect. This study proposes a two-phase approach Influence Maximization via Martingales (IMM) that meets both practical efficiency and theoretical guarantees. In particular, IMM returns an (1 − 1/e − ε)-approximate solution with at least 1 − n ^(−ℓ) probability in an O((k+ℓ)(n+m)logn/ε^2 ) running time. IMM is further extended to fit in triggering model and time-continuous model. We experimentally evaluate IMM with the state-of-the-art benchmarks under several diffusion models and parameter settings, using large networks with up to 1.4 billion edges. The experimental results show that our approach consistently outperforms the states of the art in terms of efficiency. The seed minimization problem is a variant problem of the influence maximization with the same origin from advertising. Given a social network G and a covering threshold t, the seed minimization problem is aimed to find a seed set S that has an expected influence nodes not less than t·n and minimizes the size of S. Compared to the influence maximization that maximizes the influence given a certain budget, the seed minimization problem hopes to retrench the expense to the minimum number while keeping the influence above a predefined threshold. To solve the problem, we propose GSM, a greedy algorithm with tight approximations, high generalization and easy implementations. In particular, it yields a ⌈(1 + ϕ)log(tn)⌉-approximate solution with at least 1 − n ^(−ℓ) probability, where ℓ and ϕ are both tunable. We experimentally evaluate GSM in several settings of both t and β, and it is often orders of magnitude faster compared to the traditional greedy benchmark MINTSS. GSM also gives an impressive performance on a large graph Twitter with more than a billion edges.
Engineering::Computer science and engineering::Information systems::Database management