Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/157550
Title: The old newspaper project
Authors: Mao, Junke
Keywords: Engineering::Electrical and electronic engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Mao, J. (2022). The old newspaper project. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157550
Abstract: Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion in sociology, communication and education studies. In traditional OCR models, texts are extracted sequentially within the whole page. In the case of newspaper, texts are arranged in columns based on articles with images embedded. As a result, the conversion of text materials with such a complex layout, such as multi-column text, headlines, embedded figures, etc, might impair the outcomes of the OCR results. To improve the efficiency of converting images of newspapers, we built a specialized model for newspaper recognition. The integrated model will perform object segmentation to extract the relevant components in the image, i.e., the headlines, embedded figures, etc, and performs OCR on these components accordingly. The output would be text document logically arranged with headlines, text body in single column, and embedded images appended at the end.
URI: https://hdl.handle.net/10356/157550
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_Report_Mao Junke_final.pdf
  Restricted Access
7.74 MBAdobe PDFView/Open

Page view(s)

23
Updated on Dec 9, 2022

Download(s)

3
Updated on Dec 9, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.