Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/151787
Title: FC² : cloud-based cluster provisioning for distributed machine learning
Authors: Ta, Nguyen Binh Duong
Keywords: Engineering::Computer science and engineering
Issue Date: 2019
Source: Ta, N. B. D. (2019). FC² : cloud-based cluster provisioning for distributed machine learning. Cluster Computing, 22(4), 1299-1315. https://dx.doi.org/10.1007/s10586-019-02912-6
Project: RG121/15
Journal: Cluster Computing
Abstract: Training large, complex machine learning models such as deep neural networks with big data requires powerful computing clusters, which are costly to acquire, use and maintain. As a result, many machine learning researchers turn to cloud computing services for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: (1) if not configured properly, training models on cloud-based clusters could incur significant cost and time, and (2) many researchers in machine learning tend to focus more on model and algorithm development, so they may not have the time or skills to deal with system setup, resource selection and configuration. In this work, we propose and implement FC²: a system for fast, convenient and cost-effective distributed machine learning over public cloud resources. Central to the effectiveness of FC² is the ability to recommend an appropriate resource configuration in terms of cost and execution time for a given model training task. Our approach differs from previous work in that it does not need to manually analyze the code and dataset of the training task in advance. The recommended resource configuration can then be deployed and managed automatically by FC² until the training task is completed. We have conducted extensive experiments with an implementation of FC², using real-world deep neural network models and datasets. The results demonstrate the effectiveness of our approach, which could produce cost saving of up to 80% while maintaining similar training performance compared to much more expensive resource configurations.
URI: https://hdl.handle.net/10356/151787
ISSN: 1386-7857
DOI: 10.1007/s10586-019-02912-6
Schools: School of Computer Science and Engineering 
Rights: © 2019 Springer Science+Business Media, LLC, part of Springer Nature. All right reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 50

4
Updated on Mar 16, 2025

Web of ScienceTM
Citations 50

1
Updated on Oct 30, 2023

Page view(s)

323
Updated on Mar 21, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.