Please use this identifier to cite or link to this item:
Title: Efficent sampling procedure for small storage devices
Authors: Agrawal, Rohit
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2013
Abstract: Sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. For large, multi-dimensional databases, algorithms for data analytics might require multiple iterations over the whole database which can be very expensive in terms of time. However, in many applications, approximate (rather than exact) answers to queries are often more than satisfactory. For such applications, by drilling down to a sample of members, one can quickly analyze a large multidimensional database with a focus on data trends or approximate information in the initial stage. In this project, a distance based sampling algorithm DSSC (Distance based Sampling for Streaming data with Continuous attributes) is proposed. DSSC can be used in applications which require a high quality sample but are limited in terms of memory and processing power, such as mobile devices. Preliminary results on data sets show that DSSC is robust to noise and requires little memory space. We prove that the cost of an incoming transaction is at most O(n.|T|).
Schools: School of Electrical and Electronic Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
Final Year Project Report2.18 MBAdobe PDFView/Open

Page view(s)

Updated on Jun 18, 2024


Updated on Jun 18, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.