Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/86031
Title: | Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning | Authors: | Chegini, Mohammad Bernard, Jürgen Berger, Philip Sourin, Alexei Andrews, Keith Schreck, Tobias |
Keywords: | Engineering::Computer science and engineering Labelling Clustering |
Issue Date: | 2019 | Source: | Chegini, M., Bernard, J., Berger, P., Sourin, A., Andrews, K., & Schreck, T. (2019). Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning. Visual Informatics, 3(1), 9-17. doi:10.1016/j.visinf.2019.03.002 | Series/Report no.: | Visual Informatics | Abstract: | Supervised machine learning techniques require labelled multivariate training datasets. Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithms with interactive visualisations. Using appropriate techniques, analysts can play an active role in a highly interactive and iterative machine learning process to label the dataset and create meaningful partitions. While this principle has been implemented either for unsupervised, semi-supervised, or supervised machine learning tasks, the combination of all three methodologies remains challenging. In this paper, a visual analytics approach is presented, combining a variety of machine learning capabilities with four linked visualisation views, all integrated within the mVis (multivariate Visualiser) system. The available palette of techniques allows an analyst to perform exploratory data analysis on a multivariate dataset and divide it into meaningful labelled partitions, from which a classifier can be built. In the workflow, the analyst can label interesting patterns or outliers in a semi-supervised process supported by active learning. Once a dataset has been interactively labelled, the analyst can continue the workflow with supervised machine learning to assess to what degree the subsequent classifier has effectively learned the concepts expressed in the labelled training dataset. Using a novel technique called automatic dimension selection, interactions the analyst had with dimensions of the multivariate dataset are used to steer the machine learning algorithms. A real-world football dataset is used to show the utility of mVis for a series of analysis and labelling tasks, from initial labelling through iterations of data exploration, clustering, classification, and active learning to refine the named partitions, to finally producing a high-quality labelled training dataset suitable for training a classifier. The tool empowers the analyst with interactive visualisations including scatterplots, parallel coordinates, similarity maps for records, and a new similarity map for partitions. | URI: | https://hdl.handle.net/10356/86031 http://hdl.handle.net/10220/49845 |
DOI: | 10.1016/j.visinf.2019.03.002 | Schools: | School of Computer Science and Engineering | Rights: | © 2019 Zhejiang University and Zhejiang University Press. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning.pdf | 2.25 MB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
10
55
Updated on Mar 14, 2025
Web of ScienceTM
Citations
10
29
Updated on Oct 28, 2023
Page view(s)
373
Updated on Mar 20, 2025
Download(s) 50
124
Updated on Mar 20, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.