Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/74057
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPentium, Gede Bagus Bayu-
dc.date.accessioned2018-04-24T04:32:08Z-
dc.date.available2018-04-24T04:32:08Z-
dc.date.issued2018-
dc.identifier.urihttp://hdl.handle.net/10356/74057-
dc.description.abstractWith a large amount of data available, a lot of security-related information can be extracted from the data. The main problem is a large portion of them (80%-90%) are stored in an unstructured manner. One of the well-known forms of unstructured data is in the form of text. Textual data can contain much information with using a small amount of space. But textual data are mainly stored in human language, with this machine are having a hard time to extract information. Many natural language processing is done to extract information from the text. When extracting information data representation is playing a huge role. One of the most popular data representation from textual data is knowledge graph. Constructing knowledge graph from unstructured textual data can help the machine to understand the information contained in the data. This project is aimed to extract knowledge graph from Linux Kernel commit message. With consists of more than 700,000 commit message, this is a huge amount of data to be processed. If the information is successfully extracted, the information contained will benefit a lot in computer security. The knowledge graph extraction consists of four processes. They are data cleaning, entity extraction, relation extraction, and knowledge graph construction. Entity extraction is a process to recognize named entities from the text into pre-defined categories. For entity extraction, a combination of automated labeling and machine learning (CRF classifier) are used. Relation extraction is a process to detect and classify semantic relationship between the pre-extracted entities in text. For relation extraction, both schema-based and schema-free relation is extracted. After the extraction, 1,247,864 entities and 1,747,009 relations are extracted. With a convincing result of 74.29% F-measure score, the knowledge extraction is considered to be performing well under given circumstances.en_US
dc.format.extent62 p.en_US
dc.language.isoenen_US
dc.rightsNanyang Technological University-
dc.subjectDRNTU::Engineering::Computer science and engineeringen_US
dc.titleConstructing knowledge graph from linux kernel commit messageen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorLiu Yangen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Engineering (Computer Science)en_US
dc.contributor.supervisor2Chen Chunyangen_US
item.fulltextWith Fulltext-
item.grantfulltextrestricted-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
FYP_Report_-_Gede_Bagus_Bayu_Pentium.pdf
  Restricted Access
FYP Report2.1 MBAdobe PDFView/Open

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.