dc.contributor.authorQuach, Vinh Thanh
dc.date.accessioned2015-02-25T08:47:31Z
dc.date.accessioned2017-07-23T08:30:18Z
dc.date.available2015-02-25T08:47:31Z
dc.date.available2017-07-23T08:30:18Z
dc.date.copyright2014en_US
dc.date.issued2014
dc.identifier.citationQuach, V. T. (2014). Distributed classification with variable distributions. Master's thesis, Nanyang Technological University, Singapore.
dc.identifier.urihttp://hdl.handle.net/10356/62213
dc.description.abstractWhen the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the centralized solution. These concerns have led to the emergence of distributed classification. The promise of distributed classification is to improve the classification accuracy of a learning agent (called party) on its respective local data, using the knowledge of other parties in the distributed network. However, current explorations implicitly assume that all parties receive data from exactly the same distribution of data. We show that this is too simple a scenario, and that in reality, data across parties may be different from each other, in terms of both the data distribution of the inputs (observations) and/or the outputs (labels). We remove the current simplifying assumption by allowing parties to draw data from arbitrary distributions, thus formalizing a new and challenging problem of distributed classification with variable data distributions. We show that this problem is difficult, because it does not admit state-of-the-art solutions in the context of (conventional) distributed classification. After posing the problem and illustrating its difficulty, we present a list of remarkable research challenges (or sub-problems) that should be addressed in this challenging field. For each of those challenges, we provide some potential research directions. Finally, as the first attempt on this new problem, we present a simple-to-implement, straightforward yet working algorithm called VarDist that efficiently solves the problem where the data distribution may vary over the participating parties. Although VarDist is not a complete and sophisticated solution, it does have low costs of communication, while providing a more accurate classifier (than local learning) by benefiting from the auxiliary classifiers from the other parties.en_US
dc.format.extent72 p.en_US
dc.language.isoenen_US
dc.subjectDRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognitionen_US
dc.titleDistributed classification with variable distributionsen_US
dc.typeThesis
dc.contributor.researchCentre for Advanced Information Systemsen_US
dc.contributor.schoolSchool of Computer Engineeringen_US
dc.contributor.supervisorNg Wee Keong (SCE)en_US
dc.contributor.supervisorVivekanand Gopalkrishnan
dc.description.degreeMASTER OF ENGINEERING (SCE)en_US


Files in this item

FilesSizeFormatView
Final.pdf605.0Kbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record