TY - JOUR
T1 - Exploring the Impact of Synthetic Data Generation on Texture-based Image Classification Tasks
AU - Yordanov, Borislav
AU - Harvey, Carlo
AU - Williams, Ian
AU - Ashley, Craig
AU - Fairbrass, Paul
PY - 2023/12/31
Y1 - 2023/12/31
N2 - In this study, we introduce a novel pipeline for synthetic data generation of textured surfaces, motivated by the limitations of conventional methods such as Generative Adversarial Networks (GANs) and Computer-Aided Design (CAD) models in our specific context. We also investigate the pipeline's role in an image classification task. The primary objective is to determine the impact of synthetic data generated by our pipeline on classification performance. Using EfficientNetV2-S as our image classifier and a dataset of three texture classes, we find that synthetic data can significantly enhance classification performance when the amount of real data is scarce, corroborating previous research. However, we also observe that the balance between synthetic and real data is crucial, as excessive synthetic data can negatively impact performance when sufficient real data is available. We theorize that this might stem from imperfections in the synthetic data generation process that distort fine details essential for accurate classification, and propose possible improvements to the synthetic data generation pipeline. Furthermore, we acknowledge the potential limitations of our study and provide several promising avenues for future research. This work illuminates the advantages and potential drawbacks of synthetic data in image classification tasks, emphasizing the importance of high-quality, realistic synthetic data that complements, rather than undermines, the use of real data.
AB - In this study, we introduce a novel pipeline for synthetic data generation of textured surfaces, motivated by the limitations of conventional methods such as Generative Adversarial Networks (GANs) and Computer-Aided Design (CAD) models in our specific context. We also investigate the pipeline's role in an image classification task. The primary objective is to determine the impact of synthetic data generated by our pipeline on classification performance. Using EfficientNetV2-S as our image classifier and a dataset of three texture classes, we find that synthetic data can significantly enhance classification performance when the amount of real data is scarce, corroborating previous research. However, we also observe that the balance between synthetic and real data is crucial, as excessive synthetic data can negatively impact performance when sufficient real data is available. We theorize that this might stem from imperfections in the synthetic data generation process that distort fine details essential for accurate classification, and propose possible improvements to the synthetic data generation pipeline. Furthermore, we acknowledge the potential limitations of our study and provide several promising avenues for future research. This work illuminates the advantages and potential drawbacks of synthetic data in image classification tasks, emphasizing the importance of high-quality, realistic synthetic data that complements, rather than undermines, the use of real data.
KW - Synthetic Data
KW - Image Classification
KW - Textured Surface
UR - http://www.open-access.bcu.ac.uk/14876/
U2 - 10.2312/imet.20231264
DO - 10.2312/imet.20231264
M3 - Article
JO - 3rd International Conference on Interactive Media, Smart Systems and Emerging Technologies
JF - 3rd International Conference on Interactive Media, Smart Systems and Emerging Technologies
ER -