The Transformer-based approach has been widely adopted for time-series analysis, particularly in the fields of battery status estimation and diagnostics [[50], [51], [52], [53], [54]]. For example, one study extracted high-dimensional feature information using EIS and then used a Transformer model to estimate the SOH of batteries [55]. The results showed that feature extraction significantly improved the model estimation accuracy across the entire lifecycle, highlighting the superior performance of the Transformer model in handling long-term data. Additionally, another study extracted 28 features related to charge and discharge, applied Pearson correlation coefficients (PCC) for feature selection, and utilized standard Transformer and encoder-only Transformer neural networks for SOH estimation, demonstrating strong performance [56]. Further, a separate study improved the long-term prediction accuracy and temperature adaptability of lithium-ion battery SOH estimation by combining incremental capacity analysis (ICA) with a Transformer network [57]. Specifically, in the feature extraction process, peak features of the incremental capacity curve were extracted using a dual-filtering method, ensuring the quality of the input data. The Transformer model, with its multi-head attention mechanism, enhanced the ability to capture critical features, leading to more accurate SOH estimation. Thus, by integrating feature extraction methods such as EIS, ICA, and PCC, the effectiveness of the Transformer model in estimating battery SOH is significantly enhanced. This approach to feature extraction improves data quality, enabling the model to more accurately capture key features, which in turn markedly improves prediction accuracy and adaptability. To further leverage the strengths of deep learning architectures, another study proposed a novel SOH estimation method based on data preprocessing and a CNN-Transformer framework [58]. Features were selected and processed using PCC, principal component analysis, and min-max feature scaling methods. Validation on NASA battery dataset demonstrated the model high accuracy and stability in estimating battery SOH. However, the Transformer model faces challenges related to high computational complexity and memory consumption when processing long sequential data, limiting its efficiency in practical applications. In order to address this issue, one study introduced a multi-head probabilistic sparse self-attention mechanism and “distillation” techniques based on the Transformer model, significantly reducing computational complexity and memory consumption while enhancing prediction speed [59]. Another study proposed a method based on probsparse self-attention Transformer and multi-scale temporal feature fusion for estimating the cell SOH [60]. By incorporating the cross-stage partial- probsparse self-attention mechanism, each key (K) in the scaled dot-product computation can focus on primary queries (Q), significantly optimizing computational efficiency and memory usage. Furthermore, the introduction of dilated causal convolution effectively expands the receptive field without increasing computational load, enabling the Transformer model to capture long-range dependencies more efficiently. In summary, these improvements significantly enhance the performance and applicability of the Transformer model in handling complex time-series data.