Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/174534
Title: A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
Authors: Liu, Wenyang
Wang, Yi
Wu, Kejun
Yap, Kim-Hui
Chau, Lap-Pui
Keywords: Computer and Information Science
Issue Date: 2023
Source: Liu, W., Wang, Y., Wu, K., Yap, K. & Chau, L. (2023). A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings. 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). https://dx.doi.org/10.1109/AICAS57966.2023.10168636
Project: NRF2018NCR-NCR009-0001 
Conference: 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Abstract: File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on the FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios.
URI: https://hdl.handle.net/10356/174534
ISSN: 2834-9857
DOI: 10.1109/AICAS57966.2023.10168636
Schools: School of Electrical and Electronic Engineering 
Rights: © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/AICAS57966.2023.10168636.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:EEE Conference Papers

Files in This Item:
File Description SizeFormat 
aicas23.pdf685.96 kBAdobe PDFThumbnail
View/Open

SCOPUSTM   
Citations 50

4
Updated on Mar 9, 2025

Page view(s)

110
Updated on Mar 26, 2025

Download(s) 50

43
Updated on Mar 26, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.