Please use this identifier to cite or link to this item:
Title: Identifying correlated heavy-hitters
Authors: Zhou, Ziqi
Keywords: Science::Mathematics
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Zhou, Z. (2022). Identifying correlated heavy-hitters. Final Year Project (FYP), Nanyang Technological University, Singapore.
Abstract: The heavy hitter problem asks to find the top k most frequent elements in a data stream. This problem has been used in many applications across network data analysis, event mining, etc. Many classical algorithms can only handle one-dimensional data such as Count-Sketch and Count-Min. But in this study, we focus on heavy hitters in two-dimensional data. Our goal is to identify the location of heavy hitters and estimate their value. In the first part, we use a two-sided Count-Sketch to estimate the value of heavy hitters. In the second part, we use error-correcting codes and hashing matrices to identify the location of heavy hitters. A two-sided Count-Sketch means applying Count-Sketch twice. First we apply Count-Sketch on the rows, hashing n rows into Θ(k poly(log n)) different buckets. With a large probability the heavy rows are isolated in different buckets and therefore their l2 norms are preserved. Next we apply Count-Sketch on the columns, allowing us to estimate the heavy entries in each row-bucket. The resulting matrix will have a much smaller dimension than the original matrix. Identification of heavy hitters is built upon Count-Sketch matrices and bit-testing matrices. We further incorporate error-correcting codes to reduce the failure probability. We also use a Johnson-Lindenstrauss matrix to estimate the l2 norms of the rows for identification of the heavy rows.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SPMS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP final thesis.pdf
  Restricted Access
478.79 kBAdobe PDFView/Open

Page view(s)

Updated on May 17, 2022


Updated on May 17, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.