VT-MCNet: High-accuracy automatic modulation classification model based on
vision transformer <Abstract> Cognitive radio networksf evolution hinges significantly on the use of automatic modulation classification (AMC). However, existing research reveals limitations in attaining high AMC accuracy due to ineffective feature extraction from signals. To counter this, we propose a vision-centric approach employing diverse kernel sizes to augment signal extraction. In addition, we refine the transformer architecture by incorporating a dual-branch multi-layer perceptron network, enabling diverse pattern learning and enhancing the modelfs running speed. Specifically, our architecture allows the system to focus on relevant portions of the input sequence, thus, it improves classification accuracy for both high and low signal-to-noise regimes. By utilizing the widely recognized DeepSig dataset, our pioneering deep model, termed as VT-MCNet, outshines prior leading-edge deep networks in terms of classification accuracy and computational costs. Notably, VT-MCNet reaches an exceptional cumulative classification rate of up to 99.24%, while the state-of-the-art method, even with higher computational complexity, can only achieve 99.06%. |