Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/103255
Title: | How to find a perfect data scientist : a distance-metric learning approach | Authors: | Hu, Han Luo, Yong Wen, Yonggang Ong, Yew-Soon Zhang, Xinwen |
Keywords: | Natural Language Processing DRNTU::Engineering::Computer science and engineering Data Scientist |
Issue Date: | 2018 | Source: | Hu, H., Luo, Y., Wen, Y., Ong, Y.-S., & Zhang, X. (2018). How to find a perfect data scientist : a distance-metric learning approach. IEEE Access, 6, 60380-60395. doi:10.1109/ACCESS.2018.2870535 | Series/Report no.: | IEEE Access | Abstract: | The title of data scientist has been described as one of the sexiest jobs of the 21st century. Numerous efforts have been made to define the job of a data scientist in a qualitative manner by, for example, listing the job functions and required skill sets of data scientists. However, to the best of our knowledge, no attempt has been made to define the term data scientist in a scientific manner. In this paper, we address this issue by using a data-driven approach to answer three questions: 1) What is a proper definition of the term data scientist from a market-demand perspective? 2) Do self-described data scientists meet the market demand? and 3) Finally, how can companies efficiently recruit data scientists that match their openings? To answer these questions, we crawl two data sets for the supply and demand sides. For the former, we collect a set of data scientist user profiles from LinkedIn; for the latter, we collect a set of data scientist job descriptions from Monster. We first parse the set of data scientist job descriptions via natural language processing techniques and derive a scientific definition of the job of a data scientist via a clustering algorithm. Second, we use the same approach to determine that, under the aforementioned definition, self-claimed data scientists on the market would meet the market demand with a high probability. Finally, we introduce a distance-metric learning approach that can be used by companies to find data scientist candidates that match their openings. We achieve an average precision of 12.31%; i.e., one in ten candidates with matching qualifications would accept a given offer. The application of this quantitative approach could significantly reduce the human-resource costs incurred by companies in recruiting matching data scientists. | URI: | https://hdl.handle.net/10356/103255 http://hdl.handle.net/10220/47276 |
DOI: | 10.1109/ACCESS.2018.2870535 | Schools: | School of Computer Science and Engineering | Research Centres: | Data Science and Artificial Intelligence Research Centre | Rights: | © 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
How to Find a Perfect Data Scientist_ A Distance-Metric Learning Approach.pdf | 2.31 MB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
50
3
Updated on Mar 8, 2025
Web of ScienceTM
Citations
50
2
Updated on Oct 24, 2023
Page view(s) 50
654
Updated on Mar 24, 2025
Download(s) 50
192
Updated on Mar 24, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.