novel ransomware detection by Deep learning

1 Deep Learning Methods for Ransomware Detection Based on Digital DNA Sequencing 8/2/2024 D.C Members

Introduction Literature survey Problem identification Objectives Phase I: Deep Ensemble Classifier for Ransomware Identification using Digitalized DNA Genotyping System Phase II: Time-Series Sequence Generative Adversarial Network for Improving Deep Ensemble Classifier based Ransomware Detection Phase III: Deep Ensemble Enhanced Ransomware Prediction for Spatial-Temporal Learning Conclusion Future work Publications AGENDA 2 8/2/2024

Network security is crucial for protecting customer data, ensuring reliable access and mitigating cyber-attack risks, including hardware, software and cloud services. A robust system employs a layered defence system, combining various security measures to prevent threats from entering and reduce operational costs. Certainly , Organizations may lack proactive measures, leading to weak security protocols and severe challenges in network protection. Malware software such as Trojan, Adware and Ransomware can significantly attack network platforms, causing criminal acts, deceit, fraud and tribal threats. It is important to control and monitor abnormal host behaviors like to prevent damage to organizations and individuals. Introduction – Network Security 3 8/2/2024

Ransomware is a type of malicious software that encrypts computer systems or data until a specific amount of money is paid . Ransomware primarily targets Windows platforms, but it has also begun targeting Apple, Android and Linux servers. The initial ransomware attacks demanded a ransom in exchange for the encryption key to regain access to the infected data or use the infected device . Regular or continuous data backups can help organizations reduce costs from ransomware attacks and potentially avoid paying the ransom demand . Ransomware lifecycle comprises six stages: malware distribution, infection, command and control, discovery, malicious theft, file encryption, extortion and resolution. Introduction – Ransomware 4 8/2/2024

Ransomware is of three types ; Crypto, Locker and Scareware Crypto Ransomware: Encrypts files on a victim's system, making them inaccessible without a decryption key. It targets user-generated files with specific file extensions like pdf, jpg and doc, typically containing valuable and personal user data. Locker Ransomware: Locks the user out of their device or certain functionalities without encrypting files. Scareware: Tricks victims into thinking their system is infected with malware or experiencing issues that need immediate attention . Ransomware Types 5 8/2/2024

Importance in Predicting Ransomware 6 8/2/2024 Confidentiality: Ransomware can lead to unauthorized access and disclosure of sensitive information, including personal data, financial records and intellectual property. Integrity: It can corrupt or alter data, undermining its accuracy and reliability. Availability: Ransomware often encrypts data, rendering it inaccessible until a ransom is paid, which can cause significant operational disruption. Direct and Indirect Costs : Ransom payment, recovery costs, regulatory fines, productivity loss, business interruption and post-attack security measures costs are all potential consequences. Recovery Efforts : Time and resources needed to restore systems and data from backups or rebuild infrastructure . Customer Trust: Data breaches and service disruptions can erode customer confidence and loyalty.

DNA Sequencing Model For Ransomware Detection 7 8/2/2024 A DNA sequencing model has been developed for ransomware detection, combining biological DNA sequencing techniques with software behavior analysis to identify malicious activities. This method identifies unique sequences and signatures within ransomware code, similar to how genetic sequencing identifies unique genetic markers . DNA sequencing in biology involves decoding the sequence of nucleotides (A, T, C, G) in a DNA molecule. This method analogously decode and analyze sequences of instructions or system calls made by software to identify potential ransomware behavior. DNA sequencing model for ransomware detection uses software behavior patterns, integrating Artificial Intelligence based real-time monitoring to detect and mitigate threats in network security.

Limitations in DNA Sequencing Model 8 8/2/2024 Some of the challenges faced by DNA Sequencing Model This model minimizes false positives (benign software misidentified as ransomware) and false negatives (ransomware not detected ). Ransomware authors constantly evolve their techniques to evade detection, requiring continuous updates and improvements to the model Some ransomware may lie dormant for a period before activating, making it harder to detect based on immediate behavior . Identifying the key features that differentiate ransomware from benign software is challenging, as irrelevant or redundant features can negatively impact model performance .

Artificial Intelligence Models in Predicting Ransomware 9 8/2/2024 In order to predict the ransomware techniques , Artificial Intelligence (AI) model like Machine Learning (ML) and Deep Learning (DL) methods are widely utilized to classify malicious behavior. ML effectively detects malware in Windows OS and Android systems, outperforming signature-based approaches. It aims thorough study of malware attacks, threats and vulnerability of machine-powered security defence systems. But ,. ML cannot efficiently perform when digital DNA sequences are temporally changed or highly sequential. Also, ML requires extensive, high-quality labeled data, particularly up-to-date ransomware samples and benign software examples, to effectively train.

Artificial Intelligence Models in Predicting Ransomware 10 8/2/2024 DL method shows great potential in constructing cyber security applications for all types of malware functions . DL marks the cause of ransomware on the general pattern to directly distinguishing the variety of malware attacks and their variations. DL was suitable for analyzing sequential data in network traffic and can capture temporal dependencies and patterns over time in string instances. DL models are applied for anomaly detection by learning a compressed representation of normal behavior and identifying deviations indicative of ransomware . DL models require large amounts of labeled data for training, which can be difficult to obtain, especially for new ransomware variants.

Research Outline 11 8/2/2024 Various research works have been developed for the ransomware prediction models based using DL concept and DNA sequencing data but certain limitations hinder the model's performance. Inevitably, an new model is necessitated for predicting ransomware to prevent data loss and ensure the availability of critical information. This research examines every available ransomware prediction models and identifies common issues across them to suggests fresh approaches for resolving them. In this research, a prediction system is designed based on the DL and DNA sequencing data which facilitates the early detection of malwares in network security. This research allows for predicting ransomware to avoid the significant costs associated with paying ransoms to recover encrypted data .

12 Literature Review Author (Year) Title Methods Merits Demerits Moti Z., Hashemi S., Karimipour H., Dehghantanha A., Jahromi A. N., Abdi L., & Alavi F. (2021). Generative adversarial network to detect unseen internet of things malware. MalGan It enhances the model’s efficiency by increasing or decreasing attention for particular features It uses limited training data. Sharma S., Krishna C. R., & Kumar R. (2021). RansomDroid : Forensic analysis and detection of Android Ransomware using unsupervised machine learning technique RansomDroid It detcts good accuracy than compared to other methods It may not work well if the attackers displayed videos on the mobile screens to threaten victims into paying the ransom. 8/2/2024

13 Literature Review Author (Year) Title Methods Merits Demerits Hsu, C. M., Yang, C. C., Cheng, H. H., Setiasabda , P. E., & Leu , J. S. (2021). Enhancing file entropy analysis to improve machine learning detection rate of ransomware. Support Vector Machine (SVM) classifier It can identify ransomware attacks in their early stages before significant damage occur The accuracy was not satisfactory since it cannot learn more complex features like spatial, temporal and so on. Masum , M., Faruk , M. J. H., Shahriar , H., Qian, K., Lo, D., & Adnan, M. I. (2022). Ransomware classification and detection with machine learning algorithms. Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, This system does not need any sandbox system to protect the derived features from ransomware. The accuracy was less due to the limited number of data 8/2/2024

14 Literature Review Author (Year) Title Methods Merits Demerits Yamany , B., Elsayed , M. S., Jurcut , A. D., Abdelbaki , N., & Azer , M. A. (2022). A new scheme for ransomware classification and clustering using static features ML based ransomware indexing system This model was accurate and fast for detection and perfectly recover the data without any data loss. The application of static features can influence the detection performance. Berrueta , E., Morato , D., Magaña , E., & Izal , M. (2022).. Crypto-ransomware detection using machine learning models in file-sharing network scenarios with encrypted traffic. Random Forest This method shortens the classification time and eliminates the need for storage space. The detection accuracy of new ransomware strains was not effective 8/2/2024

15 Literature Review Author (Year) Title Methods Merits Demerits Ahmed U., Lin J. C. W., & Srivastava G. (2022). Mitigating adversarial evasion attacks of ransomware using ensemble learning. ML ensemble models This model archives good accuracy against adversarial evasion attacks In the case of Android ransomware, the physical device cannot be reset to a clean state. Zahoora U., Khan A., Rajarajan M., Khan S. H., Asam M., & Jamal T. (2022). Ransomware detection using deep learning based unsupervised feature extraction and a cost sensitive Pareto Ensemble classifier. Cost-Sensitive Pareto Ensemble strategy (CSPE-R) In terms of false positives and false negatives, it finds a happy medium that takes cost into account. Limited availability of ransomware instances can hinder the models performances due to its challenging nature. 8/2/2024

16 Literature Review Author (Year) Title Methods Merits Demerits Kamboj A., Kumar P., Bairwa A. K., & Joshi S. (2023). Detection of malware in downloaded files using various machine learning models. Random Forest It can identify ransomware attacks in their early stages before significant damage occur Malware that are being detected are limited only to a very small amount. Alohali , M. A., Elsadig , M., Al- Wesabi , Al Duhayyim , M., Hilal , & Motwakel , A. (2023). Optimal deep learning based ransomware detection and classification in the internet of things environment. Sine Cosine Algorithm with DL based Ransomware Detection and Classification (SCADL-RWDC) It is robust in detecting new and evolving forms of ransomware without needing extensive manual feature engineering. Balancing the model's ability to learn patterns without overfitting is a significant challenge. 8/2/2024

17 Literature Review Author (Year) Title Methods Merits Demerits Singh, A., Mushtaq , Z., Abosaq , H. A., Mursal , S. N. F., Irfan, M., & Nowakowski , G. (2023). Enhancing ransomware attack detection using transfer learning and deep learning ensemble models on cloud-encrypted data. RANSOMNET+ model , Pre-trained CNN model This model was suitable for collecting more rannsomware traffic Though it can learn local and global features, it did not consider the spatiotemporal features, resulting in low recall and f-measure Almomani , I., Alkhayer , A., & El- Shafai , W. (2023) E2E-RDS: efficient end-to-end ransomware detection system based on static-based ML and vision-based DL approaches. End-to-End Ransomware Detection System (E2E-RDS) , CNN The low requirement of hardware and skills used for processing this module This model cannot learn the spatiotemporal features of the ransomware data leading to less efficiency. 8/2/2024

18 Limitations of Existing Protocols Several proposed solutions for detecting ransomware attacks have limitations due to inaccurate definitions of the pre-encryption stage, insufficient data and inadequate design of detection model components. The assumption that models accurately represent attack patterns at detection time may not be valid for early detection, as not enough patterns may degrade detection accuracy. Feature selection on each data subset can remove irrelevant and noisy features, but it decreases the diversity of feature subsets, thereby reducing the detection accuracy of the ransomware model . Therefore, addressing these limitations is crucial for effective and accurate detection of ransomware attacks . 8/2/2024

19 Existing (Base) Model A ML-based ransomware detection technique called DNAact -Ran was developed to detect and classify ransomware using ML-based digital DNA sequencing model. First , the Multi-Objective Grey-Wolf Optimization (MOGWO) and Binary Cuckoo Search (BCS) algorithms were used to extract the primary characteristics from the pre-processed database . Then , the digital DNA string was constructed for the selected characteristics based on the DNA string model restrictions and k- mer frequency vector. Further , the Linear Regression (LR) was applied to classify the digital DNA string as either goodware or ransomware. Conversely , this technique did not execute efficiently if the digital DNA strings were modified in time. The LR classifier cannot train satisfactorily to learn temporally modified strings while the digital DNA strings were extremely linear. 8/2/2024

20 Problem Statement for Existing Protocols ML based algorithms for ransomware detection cannot efficiently perform when digital DNA sequences are temporally changed. The active learning used in for learning temporally changed sequences cannot learn effectively when digital DNA sequences are highly sequential. In the ransomware attack detection, getting good-quality training data is one of the biggest problems in ML approaches because data labelling can be a tedious and expensive task. The detection performance of existing state-of-the-art approaches such as CNN degrade and suffer when applied to spatially related sequences. Digital DNA sequences of ransomware are spatially dependent sequences. The spatial relationships between DNA sequences in a ransomware features could be exploited to significantly improve detection accuracy, an approach that had not yet been considered by any existing techniques . 8/2/2024

21 Objectives of this Research To handle highly sequential sequences, CNN is utilized that has the capability of working on sequential data because it employs sliding window to extract precious features from sequence. To generate good quality training data, the Generative adversarial based data augmentation method is proposed to get training sets by generating new digital DNA sequences from and available DNA sequences To improve ransomware detection further, spatio -temporal relations are jointly explored by the two-dimension convolution operator. 8/2/2024

22 Dataset Description 8/2/2024 This experiment uses the real-time database from wiki database which encompasses 1524 instances and 30970 attributes of which 582 are ransomware and 942 are goodware. The ransomware instances belong to various groups that are classified as goodware , Critroni , CryptLocker , CryptoWall , KOLLAH, Kovter , Locker, MATSNU, PGPCODER, Reveton , TeslaCrypt and Trojan-Ransom. Also, various attributes extracted by this model are API invocations (API), extensions of the dropped files (DROP), registry key functions (REG) with file functions (FILES), an extension of the files involved in file functions (FILES_EXT), file directory functions (DIR) and embedded strings (STR). Source https://github.com/PSJoshi/Notes/wiki/ Total Dataset size 90MB Dataset Records 300 Attributes 30970 Features 16383 Ransomware 150 Goodware 150

23 Performances Metrics The following performance metrics are applied to evaluate the proposed and existing models throughout the thesis ( i ) Accuracy : It determines the fraction of proper classifications over the total number of data analyzed. (1) TP is the result that the classifier properly classifies the ransomware data as themselves, TN is the result that the classifier properly classifies the goodware data as themselves , FP is the result that the classifier improperly classifies the ransomware data as goodware FN is the result that the classifier improperly classifies the goodware data as ransomware . 8/2/2024

24 Performances Metrics (ii ) Precision: It determines the ransomware data which are properly classified from the total classified data in a ransomware label. (2) (iii) Recall: It determines the percentage of ransomware data that are properly classified. (3 ) (iv) F-measure: This is dependent on (4 ) (v) Error rate: It determines the fraction of improper classifications over the total number of data analyzed. (5) 8/2/2024

25 Overall Workflow of the Research 8/2/2024

26 Deep Ensemble Classifier for Ransomware Identification using Digitalized DNA Genotyping System PHASE - I 8/2/2024

27 Phase 1 - DeepERPred An active learning-based digital DNA sequencing engine called DNAact -ran was developed to predict and identify ransomware data. But , it was not suitable for temporally changed DNA sequences. To solve this, Deep Ensemble Ransomware Prediction ( DeepERPred ) model is proposed to handle the temporally changed DNA sequences and identify ransomware effectively . In this model, the raw data is converted into the required form. After that, the most relevant attributes are chosen from the pre-processed data using optimization algorithms. The selected attributes are classified by using ensemble Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) model. This ensemble classification is introduced to handle temporal changes in the DNA sequences by learning dependencies between current and past events of a sequence for identifying ransomware data. 8/2/2024

28 Schematic Overview of DeepERPred Model 8/2/2024

29 Preprocessing and Attribute Selection First , the raw ransomware database is acquired and pre-processed to transform the data into a desirable form . By pre-processing the database, the missing values, imperfect and outlier data are eliminated from the actual database . Then, the data dimensionality is reduced by choosing the most relevant attributes using MOGWO and BCS algorithms. MOGWO has 2 key elements such as a grid and an archive. Similarly, the BCS has a hunting space modelled as a d-cube. After that, the design restraints are determined and the k- mer frequency vector is computed to create the digital DNA sequence which produces the training database. This database is further fed to the ensemble CNN-LSTM classifier to identify ransomware classes . 8/2/2024

30 Ensemble CNN and LSTM Classification This ensemble classification model has 2 major structures: CNN and LSTM. In input layer, the most relevant attributes from the pre-processed database are given as input. It contains 5 CNN layers and each layer follows max-pooling. Initial 2 CNN layers comprise 64 and 128 filters with filter size 3 and max-pooling with pooling size 2. Successive 2 CNN layers comprise 256 and 512 filters with filter size 3 and max-pooling with pooling size 4. The final CNN layer comprises 1024 filters with filter size 3 and max-pooling with pooling size 6. This attribute map is then passed to the LSTM layer which comprises 70 memory blocks to train the temporally changed attributes. These are further given to the fully connected and softmax layers to classify as either ransomware or goodware. 8/2/2024

31 Ensemble CNN and LSTM Classification In this classification, LSTM consists of a 4-layer arrangement where and are the preceding cell and the present cell, is the present cell input, Sigmoid and tanh are activation functions. and is the neuron state and forgetting thresholdd at interval to adjust the likelihood of the final neuron state by the sigmoid function . After, function is used for creating a fresh memory and to adjusts the fresh data can be included to the neuron state. As well, is the outcome threshold which estimates the output neuron states by the sigmoid function and applies for processing the neuron state to get the absolute result. The LSTM’s input layer has 3 different parameters such as data, period and attribute size where the period is the sliding window size. The period range computes many prior successive incoming information influencing the present incoming information. 8/2/2024

32 Ensemble CNN and LSTM Classification The configuration supports LSTM for learning the long-term dependence details within the time sequence statistics which increases the prediction efficiency. The hidden layer consists of the amount of neurons in it. The actual execution of every gate is basically a gate operation applied by many hidden layer neurons which are completely linked with the input layers. The input layers are weighted and averaged by the weight coefficient vector. After, the offset vector is included to get the hidden layer’s outcome. The output layer has the amount of hidden layer neurons and the output size. The computation task of LSTM memory cell is defined as: ( 1) ( 2) (3) ( 4 ) 8/2/2024

33 Ensemble CNN and LSTM Classification In Eq. (1-4), and are the weight coefficient vector related to the forgetting, input, output gates and the neuron state vector, accordingly as well as and are respective offset constants, accordingly. According to this, and are computed as: (5) (6) Here, is the cell-state and is the hidden state which acts as the outcome of the unit over . The tensors and are the control gates. The operations and are termed as the input-to-hidden transition and hidden-to-hidden transition. The outcome from this LSTM is then passed to the fully connected and softmax layers to get the final solution, i.e., the given attributes are either ransomware or goodware classes. 8/2/2024

34 Ensemble CNN and LSTM 8/2/2024

35 Algorithm of DeepERPred Input : Raw ransomware database Output: Ransomware or goodware Begin Preprocess the raw database to eliminate the missing values; Apply MOGWO and BCS to choose the most important attributes; Determine the design restraints and k- mer frequency vector to create the digital DNA sequence; Obtain the training database having the most relevant attributes; Apply ensemble CNN-LSTM classifier; Get the trained classifier model and validate using the test data; Identify and categorize ransomware families and goodware data; Evaluate the efficiency of classifier; End 8/2/2024

36 Performance Evaluation In this section, the efficiency of the DeepERPred model using ensemble CNN-LSTM is analyzed by implementing it in JAVA version 1.8 . Also , the efficiency is compared with the DNAact-Ran model. The comparative analysis is conducted in terms of different metrics like accuracy, precision, recall, f-measure, error rate and Area Under Curve (AUC) used in classifier analysis. 8/2/2024

Comparison Chart of Accuracy 37 8/2/2024 The above analysis indicates that the accurcay of DeepERPred is 10.93% higher than the DNAact-Ran model. This model enhances the accuracy of identifying and classifying ransomware tags by learning the relationship between current and past events in a sequence.

Comparison Chart of Precision 38 8/2/2024 The above analysis indicates that the precision of DeepERPred is 10.26% higher than the DNAact-Ran model. This is due to the learning of temporal changes in DNA sequences effectively by the DeepERPred classifier .

Comparison Chart of Recall 39 8/2/2024 The above analysis indicates that the recall of DeepERPred is 10.14% higher than the DNAact-Ran model. DeepERPred outperforms other classifiers due to its effective training of temporal changes in sequences for identifying and categorizing ransomware classes.

Comparison Chart of F-Measure 40 8/2/2024 The above analysis indicates that the f-measure of DeepERPred is 10.02% higher than the DNAact-Ran model. As this model identifies and classifies ransomware classes by training the relationship between current and past events of sequences appropriately

Comparison Chart of Error Rate 41 8/2/2024 The above analysis indicates that the Error rate DeepERPred is 48.93 % lower than the DNAact-Ran model. This indicates that the DeepERPred can minimize the error rate for ransomware identification compared to other classification models.

Comparison Chart of Area Under Curve (AUC) 42 8/2/2024 The above analysis indicates that the AUC rate DeepERPred is 8.40% higher than the DNAact-Ran model. This indicates that the DeepERPred can capture temporal characteristics from digital DNA for predicting ransomware families.

43 TIME-SERIES SEQUENCE GENERATIVE ADVERSARIAL NETWORK FOR IMPROVING DEEP ENSEMBLE CLASSIFIER BASED RANSOMWARE DETECTION PHASE-II 8/2/2024

44 Phase 2- TSeqGAN-DeepERPred Previously, DeepERPred model has been designed, which handles the temporally changed DNA chains using Convolutional Neural Network and Long Short-Term Memory (LSTM) to recognize ransomware. In contrast, it was time-consuming and costly for manual data labeling, which affects the generation of high-quality training data. In this phase, data augmentation strategy based on a Time-series Sequence Generative Adversarial Network ( TSeqGAN ) model for generating new digitalized DNA chains from existing DNA chains. In this model, the GAN discrimination issue is solved by explicitly applying gradient policy update, which models the data generator as a stochastic policy in Reinforcement Learning (RL). 8/2/2024

45 Phase 2- TSeqGAN-DeepERPred The GAN discriminator creates the RL reward to analyze the entire chain and provides them to the intermediate state-action steps via Monte Carlo (MC) search. Also , a stepwise supervised error is adopted with the unsupervised adversarial error on both real and synthetic chains. Based on this model, the stepwise conditional distributions in the data are obtained to train the temporal relationships and generate high-quality training datasets . Further, the obtained dataset is trained by the ensemble CNN-LSTM classifier to recognize ransomware 8/2/2024

46 Overall Working Structure of TSeqGAN-DeepERPred 8/2/2024

47 DNA Chain Generation The method begins with the collection and preprocessing of a raw ransomware database in order to convert the records into the necessary format. So , the lost, improper and noisy records are discarded from the real database. Afterward, highly significant characteristics are decided by the MOGWO and BCS algorithms to minimize the attribute dimensionality. Once all the attributes are chosen, the digitalized DNA chain is produced by calculating design constraints and - mer frequency map. This digitalized DNA chain makes the training set, which is further augmented by the TSeqGAN model . 8/2/2024

48 Time-series Sequence Generative Adversarial Network There are 4 main parts of TSeqGAN : Embedding Function , Regeneration Function , Sequence Generator and Sequence Discriminator Attribute encoding, interpretation creation and interval iteration are all trained concurrently by the auto-encoder units and the adversarial units in TSeqGAN . The embedding network gives the adversarial network its latent space to operate in. A guided loss is also employed to bring the hidden dynamics of man-made and natural records into harmony . 8/2/2024

49 Mutual Training As a reversible mapping between feature and latent spaces, the embedding and regeneration functions must allow precise restorations of the actual record from their latent interpretations . The primary fitness function is the restoration loss as: ( 1) In this TSeqGAN , the generator considers synthetic embeddings to create the next synthetic vector in open-loop mode. This reduces the probability of obtaining proper classifications for both the training data and for synthetic result from the generator as. (2) 8/2/2024

50 Mutual Training To increase the efficiency, the generator is trained in closed-loop mode, where it obtains chains of embeddings of real data to create the next latent vector. Gradients are determined on a loss, which finds the discrepancy between distributions and . So , the supervised loss is obtained by the maximum probability as: (3) In Eq. (3), approximates with single data . In summary, at any iteration in a training DNA chain, the variance between the original next-step latent vector and synthetic next-step latent vector is evaluated. When pushes the generator to produce realistic chains, further guarantees that it creates analogous stepwise transitions . 8/2/2024

51 Mutual Training Optimization : Consider and are variables of embedding, regeneration, generator and discriminator units. The initial 2 units are trained on the restoration and supervised losses as: (4) In Eq. (4), denotes a hyperparameter that balances 2 losses. Notably, is added to minimize the dimensions of the adversarial training space . Also , the generator and discriminator units are trained in an adversarial manner as: (5) In Eq. (5), denotes the other hyperparameter that balances 2 losses. The generator reduces the supervised loss and maximizes the efficiency . Based on this way, TSeqGAN is concurrently trained to encode (feature vectors), create (latent interpretations) and iterate (across interval ). 8/2/2024

52 Deep Ensemble Ransomware Prediction Model After generating the synthetic DNA chains, the training set is obtained including with the actual DNA chains. Such features will be fed into the ensemble CNN-LSTM of DeepERPred which handles the temporally changed DNA sequences to identify ransomware. In TSeqGAN-DeepERPred , DeepERPred appropriately trains the generated synthetic DNA chains between the various current and prior occurrences of strings by integrating CNN and LSTM structures. Finally, the selected features were labeled as either ransomware or goodware. 8/2/2024

53 Structure of TSeqGAN 8/2/2024 Structure of TSeqGAN . Left: is trained over the actual data and the created data by . Right: is trained by policy gradient, where the final reward signal is provided by and is fed back to the intermediate action value by MC search

54 Algorithm of TSeqGAN-DeepERPred Input : Raw ransomware database Output: Ransomware or goodware Begin Preprocess the raw ransomware database to solve the missing and noisy data; Decide the most relevant characteristics using the MOGWO and BCS algorithms; Compute the design constraints and k- mer frequency map to produce the training digital DNA chains; Perform TSeqGAN to produce the high-quality training databases with the help of real and synthetic DNA chains; Train the ensemble CNN-LSTM classifier and validate it using the test data; Recognize whether the given data is ransomware or goodware; If ransomware data is recognized, classify their labels; End 8/2/2024

55 Performance Evaluation In this section, the efficiency of the TSeqGAN-DeepERPred model using ensemble CNN-LSTM is analyzed by implementing it in JAVA version 1.8 . Also , the efficiency is compared with the DNAact-Ran and DeepERPred model. The comparative analysis is conducted in terms of different metrics like accuracy, precision, recall, f-measure, error rate and AUC used in classifier analysis. 8/2/2024

Comparison Chart of Accuracy 56 8/2/2024 The accurcay of TSeqGAN-DeepERPred is 13.89% and 2.66% higher than the DNAact-Ran and DeepERPred model. This is because of augmenting the training samples and categorizing the ransomware classes.

Comparison Chart of Precision 57 8/2/2024 The precision of TSeqGAN-DeepERPred is 14.61% and 3.93% higher than the DNAact -Ran and DeepERPred model . This is because that the TSeqGAN-DeepEERPred labels the ransomware data automatically with the help of the TSeqGAN structure.

Comparison Chart of Recall 58 8/2/2024 The recall of TSeqGAN-DeepERPred is 14.13% and 3.73% higher than the DNAact -Ran and DeepERPred model. This is because that the TSeqGAN-DeepEERPred model due to the effective data labeling using the TSeqGAN structure.

Comparison Chart of F-Measure 59 8/2/2024 The F-Measure of TSeqGAN-DeepERPred is 14.36% and 3.83% lower than the DNAact -Ran and DeepERPred model . This is because the proposed model automatically recognizing the association between current and past instances of a string.

Comparison Chart of Error Rate 60 8/2/2024 The Error rate of TSeqGAN-DeepERPred is 62.12% and 25.83% higher than the DNAact -Ran and DeepERPred model. TSeqGAN-DeepERPred model can minimize the error rate for ransomware recognition compared to all other model.

Comparison Chart of AUC 61 8/2/2024 The AUC score of TSeqGAN-DeepERPred is 3.75% and 2.33% higher than the DNAact -Ran and DeepERPred model . TSeqGAN-DeeERPred model identifies the difference of string instances using the ensemble model

62 DEEP ENSEMBLE ENHANCED RANSOMWARE PREDICTION FOR SPATIAL-TEMPORAL LEARNING PHASE-III 8/2/2024

63 Phase 3- TSeqGAN-DeepEERPred Previously developed TSeqGAN-DeepERPred model’s accuracy degrades when applied to spatially related sequences, as it cannot learn spatial-temporal correlations between DNA sequences. To solve this, Deep Ensemble Enhanced Ransomware Prediction ( DeepEERPred ) model using a two-dimensional (2D) convolution operator that jointly learns spatial-temporal relations between DNA sequences for ransomware detection. First , it converts the k- mer frequency of DNA sequence analysis into a matrix-like analysis task . Then , leveraging the matrix-like data structure, the 2D convolution operator is used to jointly learn spatial and temporal correlations. Besides , a random subspace learning is introduced, in which multiple random subspaces are created for each DNA sequence and fed into the multiple ensemble CNN-LSTM, respectively 8/2/2024

64 Overall Working Structure of TSeqGAN-DeepEERPred 8/2/2024

65 Problem Formulation This section describes the basic formulation for the considered problem. The process of ransomware recognition is to predict the ransomware attack in a specific network environment based on the historical ransomware data from the global network. For a historical ransomware data , where is the ransomware data covering observation network at timeslot , the process is to predict ransomware attack at for the network . The width between two timeslots is represented as . The ransomware data of the past time indexes is utilized to estimate the succeeding ransomware. To create an ordered dataset, is signified by . The dataset comprises pairs as , where is the vector . Therefore , the ransomware in the future timeslot is estimated according to the ransomware data in the previous timeslots. 8/2/2024

66 Problem Formulation The spatial and temporal dimensions are combined called a spatiotemporal matrix, organizing input data along these dimensions arranged in the form of, ( 1) In Eq. (1), and contain the dimensions and , respectively. The raw data is first pre-processed by removing incomplete, or missing data. Then, the key features are chosen by Multi-objective Grey Wolf Optimizer (MOGWO) and Binary Cuckoo Search (BCS) algorithms to generate DNA sequences, which are augmented by the TseqGAN model. The augmented dataset is split into training and test sets. The training data is fed to the random subspace learning-based ensemble CNN-LSTM network for detecting ransomware classes . 8/2/2024

67 Random Subspace Learning A random subspace learning scheme to enhance the robustness of the ensemble CNN-LSTM model. It depends on finding spatial relationships among DNA sequences. For network, its most correlated networks are utilized to create its -dimension candidate subspace. The structure of random subspace learning-based ensemble CNN-LSTM network comprises a random subspace generation and ensemble CNN-LSTM network. An -dimension random subspace is created by arbitrarily choosing correlated networks from its candidate subspace, where . When there is any DNA sequence with a missing value in an arbitrarily chosen sequence, the sequence is eliminated and substituted by a more arbitrarily chosen sequence with actual observations. After that, for all networks, three random subspaces are created and given to the three ensemble CNN-LSTM networks, respectively. 8/2/2024

68 Random Subspace Learning Based on this scheme of arbitrarily chosen subspace with complete sequence data and a combination of many sub-models, the final model is enhanced to handle the sequence with missing data. To generate a candidate subspace, it is essential to determine the correlations among networks. The spatial correlation between the networks and is expressed by (2) In Eq. (2), is the mean of the ransomware DNA sequence for network. After that, for network, the actual regularized DNA sequence sample with time slots is defined by . The candidate subspace is defined by , where is the candidate subspace dimensionality and th networks are most correlated with the considered network. 8/2/2024

69 Random Subspace Learning After that, networks with complete data are arbitrarily chosen from the candidate subspace and -dimension random subspace with the given network is generated. Here, is represented by . In this study, the input matrix is considered as an image. The target output is for a single network model. Accordingly, the final learning pairs are provided by . Moreover , the ensemble CNN-LSTM is executed for predicting that reduces the loss value as: (3) Thus, the random subspace learning-based ensemble CNN-LSTM network is trained to explore spatial-temporal relations among ransomware attacks on different networks . 8/2/2024

70 Structure of Random Subspace Learning-based Ensemble CNN-LSTM Network 8/2/2024

71 Algorithm of TSeqGAN-DeepEERPred Input : Training and testing dataset for ransomware Output: Label of Instances of test dataset as Ransomware classes or goodware Begin Eliminating duplicate and extraneous data from the unprocessed ransomware database; Determine the most important features by applying the MOGWO and BCS algorithms; Find the k- mer frequency map and set design constraints for training digital DNA chains; Perform TSeqGAN to produce the high-quality training databases with the help of real and synthetic DNA chains; Construct Ensemble of CNN-LSTM with different configurations Train each CNN-LSTM of Ensemble model with Random Subspace Learning Obtain the fused training model of Ensemble CNN-LSTM classifier and validate it using the test data; Recognize whether the given data is ransomware or goodware; If ransomware data is recognized, classify their labels; End 8/2/2024

72 Performance Evaluation In this section, the efficiency of the TSeqGAN-DeepEERPred model using ensemble CNN-LSTM is analyzed by implementing it in JAVA version 1.8 . Also , the efficiency is compared with the DNAact-Ran, DeepERPred and TSeqGAN-DeepERPred model. The comparative analysis is conducted in terms of different metrics like accuracy, precision, recall, f-measure, error rate and AUC used in classifier analysis. 8/2/2024

Comparison Chart of Accuracy 73 8/2/2024 The accurcay of TSeqGAN-DeepEERPred is 16.07%, 4.62% and 1.91% higher than the DNAact-Ran, DeepERPred and TSeqGAN-DeepERPred model. This is due to ransomware tag classification by acquiring knowledge about the correlation between current and past string instances.

Comparison Chart of Precision 74 8/2/2024 The precision of TSeqGAN-DeepEERPred is 15.39%, 4.64% and 0.68% higher than the DNAact -Ran, DeepERPred and TSeqGAN-DeepERPred model. This is because that the TSeqGAN-DeepEERPred model correctly identifies the correlation between a string's current and past occurrences.

Comparison Chart of Recall 75 8/2/2024 The recall of TSeqGAN-DeepEERPred is 14.88%, 4.42% and 0.66% higher than the DNAact -Ran, DeepERPred and TSeqGAN-DeepERPred model. This is because that the TSeqGAN-DeepEERPred model classifies the ransomware tags by correctly training an ensemble deep learner-based classifier .

Comparison Chart of F-Measure 76 8/2/2024 The F-Measure of TSeqGAN-DeepEERPred is 15.13%, 4.53% and 0.67% lower than the DNAact -Ran, DeepERPred and TSeqGAN-DeepERPred model. This is because the proposed model enhances ransomware prediction by automatically recognizing the association between current and past instances of a string.

Comparison Chart of Error Rate 77 8/2/2024 The Error rate of TSeqGAN-DeepEERPred is 71.87%, 44.91% and 25.72% higher than the DNAact -Ran, DeepERPred and TSeqGAN-DeepERPred model. TSeqGAN-DeeEERPred strengthens the association between recent and historical occurrences, lowering the error rate in ransomware tag identification.

Comparison Chart of AUC 78 8/2/2024 The AUC score of TSeqGAN-DeepEERPred is 3.74%, 1.53% and 0.10% higher than the DNAact -Ran, DeepERPred and TSeqGAN-DeepERPred model. The TSeqGAN-DeeEERPred model utilizes random subspace learning and analyzes spatial-temporal correlations across network environments and time frames.

79 Thesis Organization Chapter 1: It represents the introduction of network security, explains about ransomware techniques and its categories, genotype DNA sequencing and its uses. Also it summarizes the motivation and contribution of the research work. Chapter 2: Explored the current research on methodological approaches for detecting ransomware using digital DNA sequencing. Additionally, compared the methodology, advantages and disadvantages of the present work. Chapter 3: This study examines and clarifies the research methodology, research scope and performance measures used in this research work. Chapter 4: Introduces the first contribution of the research titled "Deep Ensemble Classifier for Ransomware Identification Using Digitalized DNA Genotyping System". 8/2/2024

80 Thesis Organization Chapter 5: The second part of the study is called Time-Series Sequence Generative Adversarial Network ( TSeqGAN ) and it explains how to improve ransomware detection using deep ensemble classifiers. Chapter 6: The third contribution of the study called "Deep Ensemble Enhanced Ransomware Prediction for Spatial-Temporal Learning" is explained. Chapter 7: Compares the overall results of existing work and current work's. Chapter 8: Describes the possible improvements for the future and concludes the study effort with their results. 8/2/2024

81 Conclusion The research focuses on DL Methods for Ransomware Detection Based on Digital DNA Sequencing with three phases implemented in the JAVA version 1.8. Initially, DeepERPred model was developed to detect ransomware by analyzing temporal DNA sequences, transforming unprocessed data and classifying pertinent attributes using an ensemble CNN-LSTM model . Then, TSeqGAN-DeePERPred model was created to digitalize DNA chains from preexisting ones, using gradient policy updating and unsupervised adversarial error . Finally, TSeqGAN - DeepEERPred model which uses a 2D convolution operator to jointly learn spatial-temporal interactions between DNA sequences for ransomware detection . 8/2/2024

82 Conclusion The constructed research work develops effective models to increase the accuracy of classifying as well as detecting the ransomware attacks. DeepERPred , TSeqGAN and DeepEERPred models are designed by modeling the temporal correlation using the synthetic ransomware data. Thus , the proposed model is highly helpful to classify and detect the ransonware attacks to overcome the threats in many applications. Finally , the experimental findings revealed that the proposed DeepEERPred model achieves accuracy of 98.18% error rate compared to the other models for ransomware recognition and categorization . 8/2/2024

83 Future Enhancement This proposed research can include a variety of future works, some of which are the following Explainable AI methods in ransomware detection models improve transparency, interpretability and vulnerability identification, while reducing dimensionality and maintaining detection accuracy . Big data analytics and distributed computing frameworks are utilized to efficiently manage large amounts of data, enabling swift identification of ransomware attacks in complex network settings, thanks to diverse data sources . The integration of cyber threat information feeds and real-time updates can enhance the effectiveness of ransomware detection systems in identifying new ransomware strains and threat actors' techniques. 8/2/2024

84 References Vignau , B., Khoury , R., Hallé , S., & Hamou-Lhadj , A. (2021). The evolution of IoT malwares, from 2008 to 2019: survey, taxonomy, process simulator and perspectives. Journal of Systems Architecture, 116, 1-32. Alenezi , M. N., Alabdulrazzaq , H., Alshaher , A. A., & Alkharang , M. M. (2020). Evolution of malware threats and techniques: a review. International Journal of Communication Networks and Information Security, 12(3), 326-337. Maniriho , P., Mahmood, A. N., & Chowdhury, M. J. M. (2021). A study on malicious software behaviour analysis and detection techniques: taxonomy, current trends and challenges. Future Generation Computer Systems, 130, 1-18. Popoola , S. I., Ojewande , S. O., Sweetwilliams , F. O., John, S. N., & Atayero , A. A. (2017). Ransomware: current trend, challenges and research directions. In Proceedings of the World Congress on Engineering and Computer Science, I, pp. 1-6. Shah , N., & Farik , M. (2017). Ransomware-threats vulnerabilities and recommendations. International Journal of Scientific & Technology Research, 6(06), 307-309. Al-rimy , B. A. S., Maarof , M. A., & Shaid , S. Z. M. (2018). Ransomware threat success factors, taxonomy and countermeasures: a survey and research directions. Computers & Security, 74, 144-166. Yaqoob , I., Ahmed, E., ur Rehman , M. H., Ahmed, A. I. A., Al- garadi , M. A., Imran, M., & Guizani , M. (2017). The rise of ransomware and emerging security challenges in the Internet of Things. Computer Networks, 129, 444-458. Aurangzeb , S., Aleem , M., Iqbal, M. A., & Islam, M. A. (2017). Ransomware: a survey and trends. Journal of Information Assurance & Security, 6(2), 48-58. Paşca , V. R., & Simion , E. (2018). Challenges in cyber security: ransomware phenomenon. In Cyber-Physical Systems Security, Springer, Cham, pp. 303-330. Nadir , I., & Bakhshi , T. (2018). Contemporary cybercrime: a taxonomy of ransomware threats & mitigation techniques. In IEEE International Conference on Computing, Mathematics and Engineering Technologies, pp. 1-7. . 8/2/2024

85 References Humayun , M., Jhanjhi , N. Z., Alsayat , A., & Ponnusamy , V. (2021). Internet of things and ransomware: evolution, mitigation and prevention. Egyptian Informatics Journal, 22(1), 105-117. Kapoor , A., Gupta, A., Gupta, R., Tanwar , S., Sharma, G., & Davidson, I. E. (2022). Ransomware detection, avoidance, and mitigation scheme: a review and future directions. Sustainability, 14(1), 1-24. Malecki , F. (2019). Best practices for preventing and recovering from a ransomware attack. Computer Fraud & Security, 2019(3), 8-10. Reshmi , T. R. (2021). Information security breaches due to ransomware attacks-a systematic literature review. International Journal of Information Management Data Insights, 1(2), 1-10. Gibert , D., Mateu , C., & Planes, J. (2020). The rise of machine learning for detection and classification of malware: research developments, trends and challenges. Journal of Network and Computer Applications, 153, 1-22. Beaman , C., Barkworth , A., Akande , T. D., Hakak , S., & Khan, M. K. (2021). Ransomware: recent advances, analysis, challenges and future esearch directions. Computers & Security, 111, 1-22. Xu , H., Zhou, Y., Ming, J., & Lyu , M. (2020). Layered obfuscation: a taxonomy of software obfuscation techniques for layered security. Cybersecurity, 3(1), 1-18. Popli , N. K., & Girdhar , A. (2019). Behavioural analysis of recent ransomwares and prediction of future attacks by polymorphic and metamorphic ransomware. In Computational Intelligence: Theories, Applications and Future Directions-Volume II, Springer, Singapore, pp. 65-80. Sahay , S. K., Sharma, A., & Rathore , H. (2020). Evolution of malware and its detection techniques. In Information and Communication Technology for Sustainable Development, Springer, Singapore, pp. 139-150. Khan , F., Ncube , C., Ramasamy , L. K., Kadry , S., & Nam, Y. (2020). A digital DNA sequencing engine for ransomware detection using machine learning. IEEE Access, 8, 119710-119719. Moti Z., Hashemi S., Karimipour H., Dehghantanha A., Jahromi A. N., Abdi L., & Alavi F. (2021). Generative adversarial network to detect unseen internet of things malware. Ad Hoc Networks, Vol. 122, pp. 102591. 8/2/2024

86 References Sharma S., Krishna C. R., & Kumar R. (2021). RansomDroid : Forensic analysis and detection of Android Ransomware using unsupervised machine learning technique. Forensic Science International: Digital Investigation, Vol. 37, pp. 301168. Hsu , C. M., Yang, C. C., Cheng, H. H., Setiasabda , P. E., & Leu , J. S. (2021). Enhancing file entropy analysis to improve machine learning detection rate of ransomware. IEEE Access, 9, 138345-138351 Masum , M., Faruk , M. J. H., Shahriar , H., Qian, K., Lo, D., & Adnan, M. I. (2022, January). Ransomware classification and detection with machine learning algorithms. In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0316-0322). IEEE. Yamany , B., Elsayed , M. S., Jurcut , A. D., Abdelbaki , N., & Azer , M. A. (2022). A new scheme for ransomware classification and clustering using static features. Electronics, 11(20), 3307. Berrueta , E., Morato , D., Magaña , E., & Izal , M. (2022). Crypto-ransomware detection using machine learning models in file-sharing network scenarios with encrypted traffic. Expert Systems with Applications, 209, 118299. Ahmed , U., Lin, J. C. W., & Srivastava, G. (2022). Mitigating adversarial evasion attacks of ransomware using ensemble learning. Computers and Electrical Engineering, 100, 107903 Zahoora , U., Khan, A., Rajarajan , M., Khan, S. H., Asam , M., & Jamal, T. (2022). Ransomware detection using deep learning based unsupervised feature extraction and a cost sensitive Pareto Ensemble classifier. Scientific Reports, 12(1), 15647. Kamboj, A., Kumar, P., Bairwa , A. K., & Joshi, S. (2023). Detection of malware in downloaded files using various machine learning models. Egyptian Informatics Journal, 24(1), 81-94. Alohali , M. A., Elsadig , M., Al- Wesabi , F. N., Al Duhayyim , M., Hilal , A. M., & Motwakel , A. (2023). Optimal Deep Learning Based Ransomware Detection and Classification in the Internet of Things Environment. Computer Systems Science & Engineering, 46(3). Singh , A., Mushtaq , Z., Abosaq , H. A., Mursal , S. N. F., Irfan, M., & Nowakowski , G. (2023). Enhancing ransomware attack detection using transfer learning and deep learning ensemble models on cloud-encrypted data. Electronics, 12(18), 3899. Almomani , I., Alkhayer , A., & El- Shafai , W. (2023). E2E-RDS: Efficient End-to-End ransomware detection system based on Static-Based ML and Vision-Based DL approaches. Sensors, 23(9), 4467. . 8/2/2024

87 Publications Yuvaraj S., & Robert L. (2021). A Survey on Ransomware Detection Using Machine Learning and Deep Learning. International Journal of Research and Analytical Reviews (IJRAR), Vol. 8, No. 3, pp. 559-565. Yuvaraj S., & Robert L. (2022). Deep Ensemble Classifier for Ransomware Identification Using Digitalized DNA Genotyping System. International Journal of Intelligent Engineering & Systems (IJIES), Vol. 15, No. 6, pp. 503-510. ( Scopus Indexed) Yuvaraj S., & Robert L. (2023). Time-Series Sequence Generative Adversarial Network for Improving Deep Ensemble Classifier Based Ransomware Detection. International Journal of Engineering Research and Applications (IJERA), Vol. 13, No. 7, pp. 56-68. 8/2/2024

88 Publications International Journal Sukumar , P., Robert L., & Yuvaraj S. (2016). Review on Modern Data Preprocessing Techniques in Web Usage Mining (WUM). International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 64-69. 8/2/2024

89 Thank you 8/2/2024

novel ransomware detection by Deep learning

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

novel ransomware detection by Deep learning

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77