Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/155298
Title: XGBoost, mordred and RDKit for the prediction of glass transition temperature of polymers
Authors: Goh, Kai Leong
Keywords: Science::Chemistry
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Goh, K. L. (2021). XGBoost, mordred and RDKit for the prediction of glass transition temperature of polymers. Student Research Paper, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155298
Project: SPMS20062 
Abstract: Glass transition temperature (Tg) is the temperature at which a polymer changes from crystalline state to rubbery state. This change in the property below and above Tg is very important in food science and pharmaceutical industries. In recent decades, there has been a growth in using machine learning (ML) to develop quantitative structure–property relationship (QSPR) models. QSPR uses molecular descriptors and molecular fingerprints as features to predict the properties of chemical compounds. As a result, numerous works have been dedicated to creating a good QSPR model to predict Tg. However, to the best of our knowledge, there was no previous research work that involved the use of the Mordred molecular descriptors library or the Extreme Gradient Boosting (XGBoost) regression algorithm to predict Tg. Therefore, this project employed Mordred and XGBoost, together with the RDKit cheminformatics library to predict Tg of 640 polymers. A total of 12 sets of features were generated by RDKit and Mordred as inputs for XGBoost to predict Tg. The scoring metrics from the Scikit-learn and Numpy libraries showed that the 2D molecular descriptors of Mordred (Mordred-2D) and the Extended-Connectivity Fingerprint with a diameter of 4 bonds (ECFP4) had the best performances. The results further improved when Mordred-2D and ECFP4 were combined to form a new set of features. Future work aims to increase the number of polymer data points and explore better methods to represent the polymer repeating units for the calculation of descriptors and fingerprints.
URI: https://hdl.handle.net/10356/155298
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:URECA Papers

Files in This Item:
File Description SizeFormat 
URECA_2020-2021_Goh Kai Leong.pdf436.09 kBAdobe PDFThumbnail
View/Open

Page view(s)

198
Updated on May 24, 2022

Download(s) 50

123
Updated on May 24, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.