Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/59053
Title: Mining HIV : 1 information from literature
Authors: Lim, Clarence Jia Xian
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2014
Abstract: HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents about the histones modification. However, they are very time consuming for biologist to retrieve manually. Thus, the project attempts to automate the retrieval of the information from the databases and integrate them into a single source for ease of access. The program created consists of certain components to aid the construction of the information source. Document Collection System is the first component of the program which collects documents and abstracts from the online databases and cleaned them for the next stage to process. TEES is the next component which takes in the cleaned documents and extracts the proteins and histone modification events from them. TEEStoCSV Convertor program takes the output of TEES and convert the individual file data into CSV format. Histone Events Compilation program combines the individual CSV files into 1 overall CSV file and filter out the invalid histones. Sampling Program takes the overall CSV file and randomly select 100 samples for the verification process. Normalization Program takes the overall CSV file and normalized the terms for the visualization program, Graphviz. GeneToUniprot program takes the overall CSV file and convert the genes names to Swiss-Prot IDs. Lastly, the XML Constructor program uses the output from the GeneToUniprot program and combined with an extracted histone file to construct the XML file. The overall design architecture uses a pipe and filter style to allow extensibility and ease of modification to individual components. The verification results were overall satisfied as more than half of the samples were correct. Some of the error types found were also able to be resolved. The final result of the program is a XML file which allows the information to be easily distributed and access. Some recommendation is suggested in this project to increase the quality of the results by improving the TEES system’s event detection.
URI: http://hdl.handle.net/10356/59053
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_final.pdf
  Restricted Access
1.78 MBAdobe PDFView/Open

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.