An Improved CTGAN for Data Processing Method of Imbalanced Disk Failure


Jingbo Jia1, Peng Wu1 and Hussain Dawood2, 1University of Jinan, China, 2National Skills University Islamabad, China


To address the problem of insufficient failure data generated by disks and the imbalance between the number of normal and failure data. The existing Conditional Tabular Generative Adversarial Networks(CTGAN) deep learning methods have been proven to be effective in solving imbalance disk failure data. But CTGAN cannot learn the internal information of disk failure data very well. In this paper, a fault diagnosis method based on improved CTGAN, a classifier for specific category discrimination is added and a discriminator generate adversarial network based on residual network is proposed. We named it Residual Conditional Tabular Generative Adversarial Networks (RCTGAN). Firstly, to enhance the stability of system a residual network is utilized. RCTGAN uses a small amount of real failure data to synthesize fake fault data; Then, the synthesized data is mixed with the real data to balance the amount of normal and failure data; Finally, four classifier (multilayer perceptron, support vector machine, decision tree, random forest) models are trained using the balanced data set, and the performance of the models is evaluated using G-mean. The experimental results show that the data synthesized by the RCTGAN can further improve the fault diagnosis accuracy of the classifier.


Imbalanced dataset, CTGAN, Synthesis data, Disk failure data, Classification