ANDROID MALWARE DETECTION PPT chap 2.pdf chap 3.pdf

heenask6 0 views 48 slides Oct 10, 2025
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

NILL


Slide Content

73



CHAPTER 3

DESIGN OF THE MODULAR MALWARE
DETECTION FRAMEWORK



3.1 INTRODUCTION

The chapter presents the architecture of the intended Android
malware detection system. The major aim is to design a multi-model,
modular structure that is based on both static and dynamic attributes for
enabling real-time, scalable, and accurate malware detection deployable on
contemporary mobile and corporate systems. The modular form aims to
maximise scalability, reusability, and flexibility. The suggested system
divides the detection pipeline into four loosely coupled stages as opposed to
monolithic ones with heavy computational resource demands and tight
operation structures: a lightweight DT-KNN classifier for fast static
detection, a Deep Belief Network (DBN) for deep static feature abstraction,
a Gated Recurrent Unit (GRU) for dynamic temporal behavior analysis, and
a DBN-GRU hybrid final classifier for overall decision-making.
Long-term flexibility and maintainability are ensured by this modular
design, in which individual components can be separately optimized,
replaced, or enhanced without affecting the overall system. The DT-KNN
classifier fills the gap with regards to low-energy, real-time detection that
functions adequately in resource-scarce environments.

74

Obfuscation-resistant semantic learning of static code features is dealt with
by the DBN module. The runtime behavioral sequence modeling, such as
evasion and delayed-execution methods, is addressed by the GRU module.
Lastly, via joint learning of static and dynamic space representations, the
DBN-GRU hybrid fusion layer bridges the integration gap and promotes
generalizability to context-aware and zero-day malware. All these
functionalities function together to address current problems and create the
foundation for an immensely efficient and deployable Android malware
detection system.
3.2 SYSTEM ARCHITECTURE OVERVIEW

To leverage the strengths of both traditional machine learning and
deep learning approaches in an integrated four-stage pipeline, the proposed
Android malware detection system is constructed as a hierarchical and
modular detection framework. The impetus for this framework comes from
the understanding that no singular analytical approach, static or dynamic,
can fully encompass the complexity of modern-day Android malware,
which often employs strategies such as obfuscation, delayed code
execution, polymorphism, and runtime behavior obfuscation. In order to
address this, the system is split into independent modules which deal with
temporal behavior modeling, static feature abstraction, opcode-level
heuristics, and final decision fusion separately.
Prior to advancing to dynamic runtime analysis employing a Gated
Recurrent Unit (GRU) model capable of extracting sequential dependencies
in system call traces, IPC patterns, and network interactions, the feature
space is initially rapidly and effectively screened by employing

75

the Decision Tree–K-Nearest Neighbor (DT-KNN) hybrid classifier. This
is preceded by Deep Belief Network (DBN)-based semantic interpretation
of application metadata. A softmax classifier yields the final malware
classification once static and dynamic representations are fused in the final
fusion layer into a unified latent space. Enhanced evasion resistance,
deployable in scalable choice (edge/cloud), and adaptive inference
pathways based on resource utilization are all facilitated by this
layering.The design not only optimizes detection efficiency but also
optimizes explainability, computational complexity, and real-time
performance essential for today's mobile security systems.
3.2.1 Hybrid DT-KNN Model

The algorithm integrates the Decision Tree (DT) with K-Nearest
Neighbors (KNN) to address overfitting in tree branches and improve
classification precision at decision boundaries. The DT provides a fast
hierarchical split of feature space, and KNN serves as a localized post-
refinement mechanism at underperforming leaf nodes.
3.2.1.1 Purpose: Fast, Low-Resource Early-Stage Filter

Stage 1 DT-KNN Static Classifier is intended to be an early filter in
malware detection processes, especially where there are resource
constraints, e.g., embedded systems, mobile phones, and Internet of Things
devices. Its main mission is to provide high detection rates with minimal
computational costs such that suspicious apps are identified in near real-
time or real-time before undergoing more detailed analysis.

76

Lightweight, scalable, and flexible solutions are more in demand
because of the fast-growing number of Android applications and the
dynamic nature of malware. Although its accuracy is good, traditional
machine learning and deep learning methods tend not to function in real-
world environments because of their heavy processing overhead and
latency. In order to balance speed with accuracy, our static classifier hybrid
model combines the local accuracy of K-Nearest Neighbors (KNN) with the
minimum complexity of Decision Trees (DT).


Figure 3.1 Hybrid DT-KNN Model

77

3.2.1.2 Design Rationale and Threshold Setting

The design of the lightweight DT-KNN classifier is driven by
several foundational principles:
1. Modularity: There are two logically dependent layers that make up
the hybrid model. While KNN operates only in leaf nodes for rule-pruning
classification decisions in untypical cases, the decision tree is charged with
fast hierarchical rule-based classification.
2. Efficiency over Complexity: In Stage 1, we intentionally eschew
elaborate models like deep CNNs in a bid to optimize swift response times
and resource efficiency, which are necessary in deployment on smartphones
and edge devices.
3. Error Minimization via Fusion: While quick, single DT models are
prone to overfitting and are not broad enough to handle edge cases. While
accurate in the case of instance-based learning, KNN models tend to be
computationally intensive. We alleviate both issues by employing only the
KNN for the most important decision points, or leaf nodes. Especially for
borderline cases, this combination decreases the number of false positives
and false negatives.
4. Threshold Setting

 We specify a confidence threshold for each DT leaf node.
KNN recategorizes the instance when Decision Tree
classification confidence (using entropy-based probability
distributions or Gini impurity) is less than 80%.

78

 Cross-validation is used to estimate k = 5, which is used by the
KNN layer to balance speed versus local decision accuracy.
3.2.1.3 Feature Selection Criteria

To reduce computational expense and attain high accuracy, feature
selection is essential. Light, statically extractable Android app features are
included in the proposed set of features. Static code analysis tools like
dexParser can be utilized to access these features, which do not require app
execution.
Table 3.1Features Categories

Feature Category Description
Permissions E.g., ACCESS_FINE_LOCATION, READ_SMS,
used to detect over-privileged apps.
API Calls Presence of dangerous API patterns (e.g., reflection,
socket APIs).
Manifest Attributes E.g., services declared, exported components.
Intent Filters Abnormal use of broadcast receivers.
DEX File Structure DEX file size, method counts, string table entropy.
Package Name
Features
Obfuscation patterns or suspicious naming
conventions.

To optimize this feature set

● To eliminate less informative attributes, we used Recursive
Feature Elimination (RFE) with the Decision Tree as the
estimator.

79

● To keep only the most important 25 features that are
contributing to the decision boundary, Mutual Information and
Chi-Square values were calculated.
● To avoid duplicity, the features with high multicollinearity
(Pearson's |r| > 0.8) were also removed.
This reduced feature set not only improves detection but ensures the
model remains computationally lightweight.
3.2.1.4 Training and Decision Fusion Strategy

The training of the hybrid DT-KNN model follows a two-phase strategy

1. Phase 1 – Training the Decision Tree

 The first 60% of the dataset is used to build a standard Decision
Tree classifier using the CART algorithm (Classification and
Regression Trees).
 Hyperparameters such as max_depth, min_samples_split, and
criterion (Gini impurity) are tuned using grid search.
 The resulting tree is pruned to reduce overfitting and maintain
a small model size suitable for low-memory devices.
2. Phase 2 – Integrating KNN at Leaf Nodes

 After identifying the leaf nodes with lower confidence scores,
we collect the training samples routed to those nodes.
 For each such node, a local KNN model is trained using these
samples to serve as a “second opinion” classifier.

80

 During prediction, if a new sample is routed to a low-
confidence leaf node, the local KNN revalidates or overrides
the DT decision.
3. The final prediction function is defined as

y^ = {���, ���(���|�) ≥ � | ��????????????, ��ℎ������ } (1)

Where
yDT: prediction from the Decision Tree.
yKNN: prediction from the local KNN.
θ: confidence threshold (set to 0.8 empirically).

Table 3.2 Benefits of the fusion of DT and KNN

Property
Standalone
DT
Standalone
KNN
Hybrid DT-
KNN
Speed High Low High
Accuracy Moderate High Very High
Memory Efficiency High Low Moderate
Generalization Moderate High Very High
False Positive Rate Moderate Low Very Low
Real-Time Suitability Yes No Yes


The training time is reduced since KNNs are only trained for a subset
of the data, and prediction time is minimized due to DT’s fast traversal and
limited KNN invocation.

81

Table 3.3 Final Performance of DT-KNN Static Classifier

Metric Standalone DT KNN Hybrid DT-KNN
Precision 0.91 0.95 0.98
Accuracy 0.88 0.89 0.98
Recall 0.91 0.92 0.98
F1-Score 0.91 0.94 0.97
Time (s) 18.55 101.55 8.00

In the resource-constrained cases, the Stage 1 lightweight DT-KNN
classifier satisfactorily addresses the need for rapid, resource-conscious
malware detection.

This hybrid framework lessens false alarms and enhances detection
of sophisticated malware variants by combining the benefits of instance-
based and hierarchical learning, thus being well-suited for operational
deployment.
3.2.2 Deep Belief Network (DBN) Model

The Deep Belief Network (DBN) module aims to capture
hierarchical representations of static features (permissions, APIs, intents) to
detect complex, evasive malware.

This model uses stacked Restricted Boltzmann Machines (RBMs)
with layer-wise unsupervised pretraining and a supervised fine-tuning phase
via a softmax classifier.

82




Figure 3.2 Deep Belief Network (DBN)
Step 1: Static Feature Extraction
1. Decompile APKs to obtain

 AndroidManifest.xml: to extract permissions and intents.
 smali/Java classes: to extract API call information.
2. Construct raw feature vectors

���� = [�‖�‖�] ∈ � ??????×� (2)

83

 � ∈ {0,1} |�| : Binary permission vector
 � ∈ {0,1} |�| : API call vector’
 � ∈ {0,1} |??????| : Intent filters
Step 2: Data Preprocessing and Normalization

1. One-hot encodes categorical features (if any).

2. Normalize continuous features using min-max scaling

�� ′ = ��−(�) ���(�)−���(�) (3)

3. Correlation filtering: Remove features with Pearson correlation
above a threshold τ
?????? ??????�,= ���(�� ,��) ??????�� ??????�� , ���� �� |??????�,� | > ?????? (4)

Step 3: Dimensionality Reduction with PCA

Apply PCA to reduce dimensionality while preserving variance

1. Center the data

�‾ = � − ??????, ?????? = 1 ?????? ∑ ?????? �=1 �� (5)

2. Compute covariance matrix

?????? = 1/?????? � �� (6)

3. Compute eigenvectors and eigenvalues

?????? = �??????�
�
(7)

84

�=1 �=1 �=1
1+�
−�
4. Project onto top-k components that explain 95% of variance

���� = �‾��, �� ∈ �
��
(8)

Step 4: DBN Construction (Stacked RBMs)

Each layer of the DBN is a binary Restricted Boltzmann Machine (RBM).

RBM Energy Function

�(�, ℎ) = − ∑
�
���� − ∑
�
��ℎ� − ∑
�

�
�=1
�
� �
��ℎ
�
(9)

Where

● � ∈ �
��
: weight matrix
● ��, ��: biases for visible and hidden units
Probability Distributions


�(ℎ� = 1|�) = ??????(�� + ∑
� �����) (10)
�(�� = 1|ℎ) = ??????(�� + ∑
� ���ℎ�) (11)

Where ??????(�) =
1
is the sigmoid function.

Step 5: RBM Training Using Contrastive Divergence (CD-1)

For each layer

1. Sample hidden units from input


(0)
∼ �(ℎ|�) (12)

85

� �
�
�=1
2. Reconstruct visible units

�
(1)
∼ �(�|ℎ
(0)
) (13)

3. Resample hidden units


(1)
∼ �(ℎ|�
(1)
) (14)

4. Update weights

??????� = �(⟨�ℎ ⟩
����
− ⟨�ℎ ⟩
�����
) (15)
Repeat for � RBM layers
�
(0)
= ����, �
(�)
= ℎ
(�)
∀� = 1,2, . . . , � (16)

Step 6: Fine-Tuning with Softmax Classifier

Add a softmax layer on top of the final RBM output ���?????? ∈ �
�
.

���(�
�
� +� )
�(� = �|���??????) =
�

��??????

�
(17)
�
�=1 ���(�
�
���??????+��)

Step 7: Training Objective and Backpropagation

Use cross-entropy loss

� = − ∑
??????

�
�=1 ������ �(�� = �|���?????? ) (18)

Optimize using stochastic gradient descent (SGD) or Adam

� ← � − � ⋅ ��� (19)

86

Where

● �: learning rate
● �: model parameters
Output

Final output for each application:

● Predicted class label: �ˆ ∈ {0,1}
● Class probability vector
�
(2)
= [�
������, �
���������] (20)
Latent feature vector: ���?????? ∈ �
�
for fusion with GRU.
3.2.2.1 DBN Architecture (Stacked RBMs)

A probabilistic graphical model that is generative, consisting of
multiple layers of progressively stacked Restricted Boltzmann Machines
(RBMs), is referred to as a Deep Belief Network (DBN). The DBN acquires
abstract and latent patterns for classification by learning from high-
dimensional, sparse data such as permissions, API calls, and intent filters in
static Android malware detection.
Each RBM in the stack serves as a feature detector, transforming
input into increasingly abstract features. The network typically consists of:
● Input Layer: encodes the extracted static features of APKs
(e.g., such as permissions, API calls, and intents) into
normalized or binary values.

87

● Hidden Layers (RBMs): Stochastic binary units modeling
feature dependencies constitute each hidden layer. The visible
layer of one RBM becomes the next prior's hidden layer.
● Top Layer (Classifier): To perform supervised classification
into benign or toxic categories, a softmax classifier is placed
at the last hidden layer.
Table 3.4 Example DBN Architecture

Layer Type
Number of
Units
Activation
Function
Notes
Input 300 (feature
vector)
N/A Derived from static
features (PCA reduced)
Hidden Layer 1
(RBM)
200 Sigmoid Learns low-level feature
combinations
Hidden Layer 2
(RBM)
100 Sigmoid Learns mid-level feature
representations
Hidden Layer 3
(RBM)
50 Sigmoid Captures abstract semantic
patterns
Output Layer
(Softmax)
2 Softmax Classifies apps into benign
or malware

This architecture enables learning of hierarchical abstractions,
enhancing robustness against code obfuscation and unknown malware
variants.

88

3.2.2.2 Layer-wise Pretraining and Fine-tuning

The DBN training involves two phases:

A. Unsupervised Layer-wise Pretraining

This phase uses Contrastive Divergence (CD) to learn the RBMs one
by one and initialize weights in a greedy, bottom-up fashion. This captures
the intrinsic structure of the input data prior to the introduction of label
supervision.
● Step 1: Raw input data such as binary coded permission
vectors is trained on the initial RBM.
● Step 2: The hidden layer of the initial RBM serves as input to
the subsequent RBM.
● Step 3: All the stacked RBMs go through the same process.

● Step 4: The network begins with large weights that represent
feature hierarchies at the end of pretraining.
Benefits

● It assists in preventing bad local minima by employing
startup weights near good optima.
● allows abstraction of features without labels, layer per layer.

● learns representations prior to classification, which prevents
overfitting.

89

B. Supervised Fine-tuning

Following pretraining, backpropagation through a labeled dataset is
used to fine-tune the entire network (RBM stack + classifier).
● Loss Function: Cross-entropy loss is used for binary
classification.
● Optimization: Adam optimizer or stochastic gradient descent
(SGD).
● Regularization: To avoid overfitting, apply the L2 norm or
dropout.
Supervised fine-tuning improves predictive performance, in which
the prelearned latent features are aligned with the target final classification
task.
3.2.2.3 Handling Sparse and High-dimensional Data

Because Android malware datasets contain thousands of binary
features (such as hundreds of permissions, API calls, and intent filters),
most of which are not needed or redundant, static features are usually sparse
and high-dimensional.
Techniques to Handle Sparsity

● Feature Normalization: To be able to have equal
representations, binary vectors are normalized.
● One-Hot Encoding: Sparse binary vectors are obtained from
categorical features (e.g., permission types).

90

● Correlation Filtering: To minimize noise, extra features are
eliminated with the help of Pearson correlation.
● Dimensionality Reduction via PCA:Data can be projected
onto a space of lower dimension with 95% of the variance
preserved by Principal Component Analysis.
Table 3.5 Feature Dimensionality Reduction

Stage Feature Count Description
Raw Features ~3,000 All permissions, APIs, intents
After Filtering ~1,000 After correlation filtering
After PCA 300 Final reduced set fed to DBN

These preprocessing steps reduce overfitting risk, enhance
generalization, and speed up training without compromising classification
accuracy.
3.2.2.4 Static Feature Learning and Classification Process

The static feature learning pipeline for malware classification using
DBN proceeds as follows:
Step 1: Static Feature Extraction

Android APKs are reverse-engineered using tools like APKTool and
Androguard to extract:
● Permissions: e.g., android.permission.INTERNET

● API Calls: e.g., getSystemService(), sendSMS()

91

● Intent Filters: e.g.,
android.intent.action.BOOT_COMPLETED
Step 2: Feature Encoding
● All features are encoded into a fixed-length binary vector.

● Example: If a permission is present, its index is set to 1;
otherwise, 0.
Step 3: Preprocessing
● Cleaning: Removal of rare/noisy features.

● Filtering: Pearson correlation to select relevant ones.

● PCA: Reduces the feature space from thousands to a
manageable number (e.g., 300).
Step 4: DBN Input and Learning

● The final preprocessed vector is fed into the DBN.

● RBMs learn increasingly abstract representations.

● The top layer, a softmax classifier, uses these features for
malware/benign classification.
Step 5: Evaluation
● Performance metrics include Precision, Recall, Accuracy, F1-
Score.
● Inference time and generalization to unseen malware samples
are also evaluated.

92

Table 3.6 DBN Performance Summary

Metric Value
Accuracy 97%
Precision 95%
Recall 95%
F1-Score 95%
Inference Time 6.8 s

The DBN is able to detect sophisticated threats that escape traditional
signature-based methods due to its ability to learn high-level semantic
properties. Static Android malware detection through a Deep Belief
Network with stacked RBMs provides a robust method to process non-
sequential, high-dimensioned, and sparse data. Through the application of
supervised fine-tuning and unsupervised layer-wise pretraining, the DBN
can discover hierarchical patterns that are often ignored by traditional
models. The model has great accuracy, low false positives, and fast
inference by combining deep feature learning with careful preprocessing
(PCA, correlation filtering) and is thus ideal for real- time deployment in
malware detection systems.
3.2.3 GRU-Based Dynamic Behavior Modeling

The purpose of this stage is to model and interpret the temporal
behavior of Android applications captured during sandbox execution. It
leverages a Gated Recurrent Unit (GRU) neural architecture to process
sequences of system calls, network patterns, and IPC events. The GRU
excels at capturing long-term dependencies and hidden temporal correlations
which are crucial for identifying time-sensitive malware patterns.

93




Figure 3.3 GRU-Based Dynamic Behavior Modeling Framework

94

Step 1: Dynamic Feature Extraction

Each Android app is executed in a virtualized sandbox environment
(e.g., using Cuckoo or DroidBox). During execution, behavioral logs are
captured over regular time intervals.
At each time step �, the feature vector �� ∈ �
�
is constructed from

● System Call Profile (��): Binary or frequency vector over �
syscall types

● Network Activity Vector (��): IP entropy, packet counts,
protocol flags
● IPC Events Vector (��): Number of messages, destinations,
channel types
�� = [��‖��‖��] ∈ �
�
(21)

Each feature is standardized

�

=
���−??????� (22)
��
??????�

Step 2: Temporal Vector Construction

The dynamic trace of an app is modeled as a time-ordered sequence

���� = {�1, �2, . . . , ��}, �� ∈ �
�
(36)

For uniformity across samples, time sequences are either truncated
(if &#3627408455; > &#3627408455;&#3627408474;&#3627408462;&#3627408485;) or zero-padded (if &#3627408455; < &#3627408455;&#3627408474;&#3627408462;&#3627408485;) to a fixed length &#3627408455;&#3627408474;&#3627408462;&#3627408485;.

95

Step 3: GRU Network Architecture

A single-layer GRU processes the input sequence to produce a series
of hidden states {ℎ1, ℎ2, . . . , ℎ&#3627408455;}. GRU dynamically controls memory
updates through gating mechanisms.

● &#3627408458;&#3627408487;, &#3627408458;&#3627408479; , &#3627408458;ℎ ∈ &#3627408453;
ℎ×&#3627408467;
: input weights
● &#3627408456;&#3627408487;, &#3627408456;&#3627408479;, &#3627408456;ℎ ∈ &#3627408453;
ℎ×ℎ
: recurrent weights
● &#3627408463;&#3627408487;, &#3627408463;&#3627408479;, &#3627408463;ℎ ∈ &#3627408453;

: bias vectors
● : element-wise multiplication
● ??????(⋅): sigmoid activation function
● &#3627408481;&#3627408462;&#3627408475;ℎ(⋅): hyperbolic tangent activation
Then the GRU update rules at each time step &#3627408481; are
1. Update Gate – controls how much past information to keep

&#3627408487;&#3627408481; = ??????(&#3627408458;&#3627408487; &#3627408485;&#3627408481; + &#3627408456;&#3627408487;ℎ&#3627408481;−1 + &#3627408463;&#3627408487;) (23)

2. Reset Gate – determines how much past state to forget

&#3627408479;&#3627408481; = ??????(&#3627408458;&#3627408479; &#3627408485;&#3627408481; + &#3627408456;&#3627408479;ℎ&#3627408481;−1 + &#3627408463;&#3627408479;) (24)

3. Candidate Hidden State – computes new memory content

ℎ˜&#3627408481; = &#3627408481;&#3627408462;&#3627408475;ℎ(&#3627408458;ℎ&#3627408485;&#3627408481; + &#3627408456;ℎ(&#3627408479;&#3627408481; ⊙ ℎ&#3627408481;−1) + &#3627408463;ℎ) (25)

4. Final Hidden State – interpolation between new and old memory

ℎ&#3627408481; = (1 − &#3627408487;&#3627408481;) ⊙ ℎ&#3627408481;−1 + &#3627408487;&#3627408481; ⊙ ℎ˜&#3627408481; (26)

96

&#3627408471;
&#3627408470;=1
The recurrence starts with
ℎ0 = 0⃗→ (27)
And proceeds iteratively for &#3627408481; = 1 to &#3627408455;.

Step 4: Final Classification Layer

The last hidden state ℎ&#3627408455; ∈ &#3627408453;

serves as a compressed embedding of
the entire dynamic behavior sequence. This vector is passed to a fully
connected softmax classification layer

&#3627408466;&#3627408485;&#3627408477;(&#3627408458;
&#3627408455;
ℎ +&#3627408463; )
&#3627408451;(&#3627408486; = &#3627408464;|ℎ&#3627408455;) =
&#3627408464;

&#3627408455;

&#3627408464;
, &#3627408438; = 2 (28)

Loss Function
&#3627408438;
&#3627408471;=1 &#3627408466;&#3627408485;&#3627408477;(&#3627408458;
&#3627408455;
ℎ&#3627408455;+&#3627408463;&#3627408471;)

The model is trained using cross-entropy loss



Where
&#3627408447; = − ∑
??????

&#3627408438;
&#3627408464;=1 &#3627408486;&#3627408470;&#3627408464;&#3627408473;&#3627408476;&#3627408468; &#3627408451;(&#3627408486;&#3627408470; = &#3627408464;|ℎ&#3627408455;
&#3627408470;
) (29)
● &#3627408486;&#3627408470;&#3627408464;: One-hot encoded true label
● &#3627408451;(&#3627408486;&#3627408470; = &#3627408464;|ℎ&#3627408455;
&#3627408470;
): Softmax probability
Backpropagation Through Time (BPTT)

The parameters {&#3627408458;, &#3627408456;, &#3627408463;} are updated using truncated BPTT, where
gradients are computed across the sequence and summed over time steps

??????&#3627408447;


??????&#3627408458;&#3627408487;
&#3627408455;
&#3627408481;=1
??????&#3627408447;


??????ℎ&#3627408481;

??????ℎ&#3627408481;

??????&#3627408487;&#3627408481;
??????&#3627408487;&#3627408481; ??????&#3627408458;&#3627408487;
(30)

Similar equations apply for &#3627408458;&#3627408479; , &#3627408458;ℎ, &#3627408456;∗, &#3627408463;∗. Optimizer: Adam or
SGD with learning rate &#3627409148;.


= ∑

97

Model Output
● Prediction
&#3627408486;ˆ = &#3627408462;&#3627408479;&#3627408468; &#3627408474;&#3627408462;&#3627408485;&#3627408451;(&#3627408486; = &#3627408464;|ℎ&#3627408455;) (31)
&#3627408464;
● Confidence vector
&#3627408465;
(3)
= [&#3627408451;
&#3627408463;&#3627408466;&#3627408475;&#3627408470;&#3627408468;&#3627408475;, &#3627408451;
&#3627408474;&#3627408462;&#3627408473;&#3627408470;&#3627408464;&#3627408470;&#3627408476;&#3627408482;&#3627408480;] (32)

● Final dynamic representation ℎ&#3627408455; for fusion with DBN output in
Stage 4

3.2.3.1 Input Data: Time-Series Dynamic Logs

The objective of dynamic analysis is to monitor the behavior of
Android apps at runtime within controlled environments so as to detect
malicious intent that static analysis may fail to detect. Time-series dynamic
logs, which capture each app's behavior over time as it interacts with the
Android operating system, network, and other applications, are the major
input to the GRU-based behavior modeling module.
Every Android APK is executed for a fixed duration (e.g., 60
seconds) within a sandboxed environment, typically a virtualized Android
environment such as the Cuckoo Sandbox or DroidBox, to generate such
logs. The runtime events are logged sequentially, and the execution is
monitored. Logs are transformed into sequential feature vectors that are
organized, where every vector is for a fixed time step.

98

Key types of extracted behaviors include

● System Calls (s): These Strace-logged entries unveil low-
level kernel operations such as memory operations, file
accesses, and inter-process communications.
● Network Activity (n): IP address, port number, protocols
(TCP/UDP/HTTP), DNS query, and packet size are all
recorded by Wireshark.
● IPC Events (c): tracks how different app components—such
as services, activities, and broadcast receivers—communicate
with each other.
To generate a multivariate time-series input, each of these event types
is represented as binary or numerical values and combined into one vector
for each time interval (e.g., every 600 milliseconds).
Table 3.7 Example of Encoded Time-Step Features

Timestep
System Call
Vector
Network
Features
IPC
Features
Composite Feature
Vector
t1 [1, 0, 1, 0] [0, 1, 0] [1, 0] [1, 0, 1, 0, 0, 1, 0, 1, 0]
t2 [0, 1, 1, 1] [1, 0, 1] [0, 1] [0, 1, 1, 1, 1, 0, 1, 0, 1]
... ... ... ... ...

The final input to the GRU network is a 3D tensor of shape
(samples, timesteps, features), where

99

● samples: Number of apps

● timesteps: Maximum sequence length (e.g., 100)
● features: Number of dynamic behavior features per timestep
This structured representation preserves the temporal ordering of
behavior, which is crucial for detecting sequences that indicate malicious
activity.
3.2.3.2 GRU Architecture and Temporal Modeling

One kind of Recurrent Neural Network (RNN) that effectively
models sequential data while avoiding problems like vanishing gradients
that are typical of vanilla RNNs is called a Gated Recurrent Unit (GRU).
Because they have a lower computing cost than LSTMs and yet
nevertheless capture long-term dependencies, they are particularly well-
suited for modeling app behavior over time.
The proposed GRU architecture consists of

● Input Layer: Accepts time-series sequences of feature
vectors.
● Single GRU Layer: Captures temporal dependencies. The
GRU cell updates its hidden state h_t using
 Update Gate: Decides how much past information to retain.

 Reset Gate: Determines how much new input to combine
with past memory.
● Dropout Layer: Prevents overfitting.

100

● Dense Layer: Transforms GRU outputs.
● Softmax Classifier: Produces probabilities for "malicious" or
"benign" classes.
Table 3.8 GRU Architecture Configuration

Layer Output Shape Parameters
Input (None, 100, 30) 0
GRU (units=64) (None, 64) ~18,000
Dropout (0.2) (None, 64) 0
Dense (64) (None, 64) 4,160
Softmax (2) (None, 2) 130

The application's entire behavioral context is represented by the
GRU's final hidden state at the last timestep. A softmax function is applied
to this state in order to calculate the likelihood that it contains malware.
3.2.3.3 Sequence Labeling and Anomaly Detection

Sequence labeling is applied to the time-series data to further
improve interpretability and detection power. Sequence labeling assigns a
label to each timestep, identifying the exact times at which suspect activity
occurs, rather than labeling the entire sequence as malicious or not.
This facilitates:

● Granular analysis: determining the most suspicious
timestamps and activities.

101

● Anomaly detection: identifying rare or unexpected sequences
through the comparison of the behavior of malware against
benign ones.
To make the GRU focus on both the end-to-end prediction and the
development of behaviors, we use sequence-level cross-entropy loss for
training.
The anomaly scores computation is about measuring how far a
current sequence is from normal, benign patterns.
For example:

● Hidden network activity that continues to recur

● Hidden system calls that begin after a few seconds

● Excessive usage of background services in IPC communications

Table 3.9 Example Sequence Labeling Output

Timestep Network Activity System Call IPC Event Label
t1 Normal Normal Normal Benign
t2 Suspicious Port Normal Normal Suspicious
t3 Data Exfiltration Hidden Call Normal Malware

This approach aids in real-time monitoring, enabling security
analysts or intrusion prevention systems to take action mid-execution.

102

3.2.3.4 Handling Delayed and Obfuscated Execution Behavior

Modern malware often employs runtime evasions in order to evade
detection, including:
● Delayed execution: Malware payloads are only activated a
few seconds after being executed.
● Conditional triggers: Malicious programs may execute only
under certain conditions, e.g., presence of a SIM card or
availability of the network.
● Code obfuscation: Benign-looking activities can hide
malicious activity.
Due to its inherent ability to support long-term reliance, the GRU can
detect these types of activities by
● Tracing contextual evolution over long periods

● Identifying uncommon sequences that show up late in the
execution process
● Obtaining sequential fingerprints that are distinct from typical
patterns of activity
This avoids the limitations of models such as decision trees and
SVMs, which ignore temporal relationships and leverage only snapshot
data. The GRU architecture can also employ attention mechanisms
optionally to focus on suspicious timesteps to enhance performance even
more.

103

3.2.3.5 Performance Evaluation

The GRU-based dynamic behavior modeling module was evaluated
using a subset of the Drebin dataset, augmented with dynamic logs for
10,000 apps. The setup included.
● Sandboxing: Cuckoo Sandbox + DroidBox

● Tracing Tools: Strace for system calls, Wireshark for
network logs
● Hardware: NVIDIA RTX 3060 GPU, Intel i7, 32GB RAM

● Sequence Length: 100 timesteps (60 seconds run time)


Table 3.10 GRU vs. Baseline Dynamic Models

Model Accuracy Precision Recall F1-Score AUC Avg Time (s)
NMLA-AMDCEF 96.3% 95.1% 95.6% 95.3% 0.97 25.4
MalVulDroid 95.7% 94.3% 95.2% 94.7% 0.96 31.2
LinRegDroid 94.2% 92.6% 93.1% 92.8% 0.95 18.9
Proposed GRU 97.9% 97.2% 97.6% 97.4% 0.99 12.6

The results clearly demonstrate the effectiveness of the GRU module,
especially in detecting complex, time-evolving enemy activity. By
modeling the dynamic behavior of Android applications as time-series data,
a GRU-based dynamic behavior analyzer significantly enhances malware
detection. It captures complex patterns that static or linear models may
miss, and minor runtime deviances and timeshifted execution patterns. With
its high accuracy, flexibility, and real-time capability, it is a priceless
component of the hybrid DBN-GRU malware detection

104

framework. Once combined, these modules present a complete defense
against emerging threats within the Android ecosystem.
3.2.4 Hybrid DBN-GRU Classification Unit


Figure 3.4 Hybrid DBN-GRU Classification

The proposed system integrates both static and dynamic analysis of
Android applications using a two-path neural pipeline. The static path uses
stacked Restricted Boltzmann Machines (DBN) to extract semantic patterns
from application metadata, while the dynamic path uses a Gated Recurrent
Unit (GRU) network to model sequential runtime behavior. The

105

outputs of both paths are fused and used for final classification via a
softmax-based neural classifier.
Static Feature Pipeline (DBN Path)
Input Features
● Permissions Vector&#3627408477; ∈ {0,1}
|&#3627408451;|


● API Calls Vector&#3627408462; ∈ {0,1}
|&#3627408436;|


● Intent Filters Vector&#3627408470; ∈ {0,1}
|??????|


These are concatenated into a raw static feature vector

&#3627408485;&#3627408480;&#3627408481;&#3627408462;&#3627408481;&#3627408470;&#3627408464; = [&#3627408477;‖&#3627408462;‖&#3627408470;] ∈ &#3627408453;
&#3627408465;&#3627408480; (33)

Preprocessing

● One-hot encoding

● Correlation-based feature selection

● PCA for dimensionality reduction

&#3627408485;&#3627408477;&#3627408464;&#3627408462; = &#3627408451;&#3627408438;&#3627408436;(&#3627408485;&#3627408480;&#3627408481;&#3627408462;&#3627408481;&#3627408470;&#3627408464;) ∈ &#3627408453;
&#3627408472;
, &#3627408472; ≪ &#3627408465;&#3627408480; (34)

DBN Architecture

A DBN is formed by stacking multiple RBMs. Each RBM layer &#3627408473;
learns hidden features ℎ
(&#3627408473;)
from the input &#3627408483;
(&#3627408473;)


&#3627408440;(&#3627408483;, ℎ) = − ∑
&#3627408470; &#3627408463;&#3627408470;&#3627408483;&#3627408470; − ∑
&#3627408471; &#3627408464;&#3627408471;ℎ&#3627408471; − ∑
&#3627408470;,&#3627408471; &#3627408483;&#3627408470; &#3627408458;&#3627408470;&#3627408471;ℎ&#3627408471; (35)

106

&#3627408470;
After layer-wise pretraining, the top RBM layer is connected to a
softmax classifier. The output of the final RBM is
&#3627408483;&#3627408439;&#3627408437;?????? ∈ &#3627408453;
&#3627408474;
(36)

This vector represents a hierarchical abstraction of the static behavior of
app &#3627408470;.
Dynamic Feature Pipeline (GRU Path)
Input Features (Time Step Specific)
● System Calls&#3627408480;&#3627408481; ∈ &#3627408453;
|&#3627408454;|

● Network Activity&#3627408475;&#3627408481; ∈ &#3627408453;
|??????|

● IPC Events&#3627408464;&#3627408481; ∈ &#3627408453;
|&#3627408438;|

Concatenated per time step

&#3627408485;&#3627408481; = [&#3627408480;&#3627408481;‖&#3627408475;&#3627408481;‖&#3627408464;&#3627408481;] ∈ &#3627408453;
&#3627408467;
(37)

Sequence across time

&#3627408459;&#3627408465;&#3627408486;&#3627408475; = {&#3627408485;1, &#3627408485;2, … , &#3627408485;&#3627408455;} (38)

GRU

1. Update gate

&#3627408487;&#3627408481; = ??????(&#3627408458;&#3627408487; &#3627408485;&#3627408481; + &#3627408456;&#3627408487;ℎ&#3627408481;−1 + &#3627408463;&#3627408487;) (39)

2. Reset gate

&#3627408479;&#3627408481; = ??????(&#3627408458;&#3627408479; &#3627408485;&#3627408481; + &#3627408456;&#3627408479;ℎ&#3627408481;−1 + &#3627408463;&#3627408479;) (40)

107

&#3627408470;
&#3627408470; &#3627408470;
&#3627408464; &#3627408470; &#3627408464;
&#3627408471;
3. Candidate state
ℎ˜&#3627408481; = &#3627408481;&#3627408462;&#3627408475;ℎ(&#3627408458;ℎ&#3627408485;&#3627408481; + &#3627408456;ℎ(&#3627408479;&#3627408481; ⊙ ℎ&#3627408481;−1) + &#3627408463;ℎ) (41)
4. Final state
ℎ&#3627408481; = (1 − &#3627408487;&#3627408481;) ⊙ ℎ&#3627408481;−1 + &#3627408487;&#3627408481; ⊙ ℎ˜&#3627408481; (42)
Final state summarizing behavior
ℎ&#3627408442;&#3627408453;&#3627408456; = ℎ&#3627408455; ∈ &#3627408453;

(43)

Feature Fusion and Classification

The outputs from the DBN and GRU modules are concatenated into
a unified feature vector
&#3627408467;&#3627408470; = &#3627408483;&#3627408439;&#3627408437;?????? ⊕ ℎ&#3627408442;&#3627408453;&#3627408456; ∈ &#3627408453;
&#3627408474;+ℎ
(44)
This fused vector is passed to a softmax classifier
&#3627408466;&#3627408485;&#3627408477;(&#3627408458;
&#3627408455;
&#3627408467; +&#3627408463; )
&#3627408451;(&#3627408486;&#3627408470; = &#3627408464;|&#3627408467;&#3627408470; ) = (45)

Where
&#3627408438;
&#3627408471;=1 &#3627408466;&#3627408485;&#3627408477;(&#3627408458;
&#3627408455;
&#3627408467;&#3627408470;+&#3627408463;&#3627408471;)

● &#3627408438; = 2: binary classification (benign or malicious)

System Output

For each application &#3627408470;, the final model outputs

● Predicted label

&#3627408486;ˆ&#3627408470; = &#3627408462;&#3627408479;&#3627408468; &#3627408474;&#3627408462;&#3627408485;&#3627408451;(&#3627408486;&#3627408470; = &#3627408464;|&#3627408467;&#3627408470;) (46)
&#3627408464;

108

● Confidence vector

&#3627408465;
(4)
= [&#3627408451;
&#3627408463;&#3627408466;&#3627408475;&#3627408470;&#3627408468;&#3627408475;, &#3627408451;
&#3627408474;&#3627408462;&#3627408473;&#3627408470;&#3627408464;&#3627408470;&#3627408476;&#3627408482;&#3627408480;] (47)

Table 3.11 Inter-module data flow and decision points

Stage Module Input Output Purpose
1 DT-KNN
Classifier
n-gram vector
Xopcode
Score vector
d(1)∈ R
2

Lightweight pre-
screening
2 DBN
Module
Static features
Xstatic
Hierarchical
representation
vDBN∈ R
m

Static code analysis
3 GRU
Module
Temporal
dynamic features
Xdyn
Hidden state
hGRU∈ R
h

Behavioral sequence
modeling
4 Fusion +
Softmax
Concatenated
vector
f=vDBN⊕hGRUf
Final
prediction y^
and score d
(4)

Final decision

3.2.3.1 Hybrid DBN-GRU Fusion Layer

An advanced fusion layer, which is the union of both the strengths of
static and dynamic behavioral analysis, is proposed by the proposed hybrid
DBN-GRU model for Android malware detection. The fourth stage in the
architecture is significant as it bridges the Gated Recurrent Unit (GRU)
network, indicative of time-related behavior sequences, with the Deep
Belief Network (DBN), which analyzes static attributes. The model ensures
top-notch classification performance by applying a specialized fusion
method and joining the learned embeddings of both

109

streams. This facilitates precise malware detection even if obfuscation or
polymorphic aspects are present.
3.2.3.2 Concatenation of Learned Static and Dynamic Embeddings

The input branches of the hybrid model work together to extract two
complementary but distinct sets of features. The DBN-based static pipeline
handles manifest components like intent filters in addition to higher-level
application features like permission sets and API usage patterns. In this
manner, application semantics are imparted intrinsically via a dense
hierarchical embedding vector named vDBN.
The dynamic pipeline utilizes GRUs to capture the sequential
evolution of application behavior during execution. This includes network
interactions, IPC patterns, and system call traces, all which are usually
typical of malicious payload execution or runtime flaws. Temporal context
and inter-temporal step dependency are captured in the last hidden state of
the GRU, or hGRU.
A fused high-dimensional embedding vector fi = vDBNi ⊕ hGRUi is
achieved by fusing these two feature representations, vDBN and hGRU,
with "⊕" denoting vector concatenation. The downstream classifier is able
to draw justifiable inferences from a comprehensive understanding of the
app's profile due to this process, which saves static and behavioral data in
the same latent space.
3.2.3.3 Fusion Strategy: Early vs. Late Fusion

110

Both feature-level and decision-level fusion can be conducted at
different points in the pipeline within hybrid models. By using a mid- level
concatenation fusion process, the current design is capable of efficiently
combining the strength of each approach. At an early stage, raw static and
dynamic information would be combined in early fusion, tending to result
in higher dimensionality and diluted feature quality. The outputs of
classifiers trained in isolation are combined in late fusion, however, which
can hamper the model's capacity to acquire cross-modal relationships. This
model's mid-level fusion approach achieves the best balance. Instead of
considering end predictions or unprocessed features, the structure combines
the learned embeddings, and it possesses the following advantages:
● Preservation of modality-specific learning: Specialized
feature learning is ensured through individual training of the
DBN and GRU pipelines so that modality-specific abstractions
are learned.
● Inter-modality correlation learning: The concatenated
vector increases evasive maneuver detection sensitivity by
enabling following dense layers to learn correlations between
dynamic and static representations.
This strategy enhances model generalization and avoids issues like
feature redundancy or overfitting that often plague early fusion schemes.



3.2.3.4 Final Classification Using Softmax/Dense Layers

111

∑ &#3627408466;
&#3627408471;
Once the unified embedding vector fi is obtained from the fusion
layer, it is passed through a fully connected dense layer followed by a
softmax activation function. This layer functions as a binary classifier,
outputting the posterior probability of the input belonging to either the
benign or malicious class.

&#3627408458;
&#3627408455;
&#3627408467; +&#3627408463;&#3627408464;
&#3627408451;(&#3627408486; = &#3627408464; ∣ &#3627408467; ) =
&#3627408466;
&#3627408464; &#3627408470; (48)


Where
&#3627408470;
&#3627408438;
&#3627408471;=1
&#3627408458;
&#3627408455;
&#3627408467;&#3627408470;+&#3627408463;&#3627408471;

● &#3627408451;(&#3627408486; = &#3627408464; ∣ &#3627408467;&#3627408470;) is the probability of class c,

● Wcand bc are the weights and bias for class c,

● C=2 (benign, malware),

● fi is the fused feature vector for app i.

Along with producing output, the classification layer aids gradient
backpropagation during training through the DBN and GRU branches,
assisting both pipelines in optimizing towards the end objective. Deep
layers preceding the softmax layer also support non-linear feature
transformation and dimensionality reduction. They are especially important
in terms of handling obfuscation or delay of behavior in modern malware
because they learn to emphasize discriminative patterns and diminish noise
in the concatenated vector.



3.2.3.5 Confidence Scoring and Adaptive Response Logic

112

The incorporation of confidence scoring techniques, which are
essential in real-world deployment settings where false positives or
negatives have substantial operational costs, is one of this model's unique
features.
Probability estimates that function as a confidence score for every
prediction are provided by the softmax output vector.
● High Confidence, High Risk (≥ 0.95): Immediate action
such as quarantine, alert generation, or app uninstallation.
● Medium Confidence (0.70–0.95): Flag for manual review or
sandbox re-execution.
● Low Confidence (< 0.70): Allow execution with real-time
monitoring and adaptive sandboxing.
This logic is supported by a straightforward confidence-aware
decision module that includes a supplementary lightweight anomaly
detector for cross-validating ambiguous cases that the main classifier flags.
By lowering false alarm overhead and guaranteeing proactive
defense against zero-day threats, our adaptive response logic greatly
improves the malware detection system's resilience and usability.



Table 3.12 Hybrid Fusion and Classification Pipeline

113

Component Description
Static Embedding
(vDBN)
Learned representation from stacked RBMs
capturing code semantics
Dynamic Embedding
(hGRU)
Final GRU hidden state capturing behavioral
sequence dependencies
Fusion Strategy Mid-level fusion (embedding-level) via vector
concatenation
Fusion Vector (fi) Concatenated feature vector passed to classifier
Classifier Type Fully Connected Dense Layer + Softmax
Activation
Output Malware/Benign classification with probability
score
Confidence
Thresholds
≥0.95 (Block), 0.70–0.95 (Review), <0.70
(Monitor)
Adaptive Response Rule-based escalation system based on
classification confidence


The building block of the proposed Android malware detection
framework is the hybrid DBN-GRU fusion layer. It is the conscious
combination of dynamic behavior profiling and static code analysis into a
unified framework. The system extracts rich contextual and structural
information through the application of an embedding-level fusion strategy,
and a fused representation that is both behaviorally and semantically
informative is achieved. This enables the model to detect even the slightest
indication of malware, including delayed-activation or camouflaged
instances. Interpretability is ensured through softmax-based classification,
and a scalable and ingenious reaction strategy is enabled by

114

the confidence-driven response module. This comprehensive blending
technique ensures that the model not just does extremely well in accuracy
and robustness, but also adapts well to deployment constraints in dynamic
Android environments.
3.3 DESIGN JUSTIFICATIONS AND OPTIMIZATIONS
3.3.1 Modular Flexibility and Upgradability

The proposed architecture's modular nature, which allows for
autonomous component development, test, optimization, and replacement,
is one of its primary benefits. Four combined but distinct units comprise
the system:
● A Deep Belief Network (DBN) for hierarchical static feature
learning (Stage 2), a hybrid Decision Tree–KNN static
classifier (Stage 1),
● A DBN-GRU hybrid deep learning blend layer (Stage 4) and a
Gated Recurrent Unit (GRU) model for runtime behavior
modeling (Stage 3).
This modularity provides flexibility for small, incremental
improvements. For instance, the current static or dynamic analysis modules
can be improved without necessitating an entire architecture redesign in the
case that new, more effective feature extraction methods (e.g., Transformer-
based encoders or Graph Neural Networks) are invented. Likewise, pruned
or quantized forms of the DBN or GRU modules, or alternative models such
as MobileNet or TinyLSTM, can be employed instead of the DBN or GRU
modules if lighter models are needed for deployment on the edge. In
subsequent versions, ensemble

115

combinations (such as LSTM + GRU or DBN + CNN) or the addition of
real-time streaming data can be explored by extending the architecture.
Since the fusion layer merely inputs embeddings, it does not need to be
reengineered to work with various upstream models. This future-proofs the
architecture and allows for experimentation and scalability.
3.3.2 Latency and Memory Footprint Optimization

Latency and memory usage are crucial for deploying malware
detection on mobile devices or in real-time threat monitoring systems. To
meet these constraints, the proposed system incorporates multiple
optimization strategies across the pipeline:
a. Lightweight Classifier in Stage 1

The hybrid Decision Tree–KNN model provides fast inference (8.00
seconds on average), outperforming deep CNNs (e.g., 385 seconds for
CNN), with only a marginal compromise in accuracy. It serves as a low-
latency filter, eliminating clearly benign apps without invoking deeper
models.
b. Efficient Static Analysis via DBN

The DBN utilizes stacked Restricted Boltzmann Machines (RBMs),
which perform unsupervised layer-wise training with Contrastive
Divergence. This architecture reduces parameter complexity and memory
usage compared to fully connected deep neural networks. With an inference
time of 6.8 seconds, the DBN strikes a good balance between depth and
efficiency.

116

c. Streamlined GRU for Dynamic Behavior

The GRU module uses a single-layer GRU with 64 hidden units,
avoiding the computational overhead of more complex architectures like
LSTMs or attention-based RNNs. Despite this simplification, it achieves
97.9% accuracy and only 12.6 seconds inference time, due to its ability to
model long-range dependencies in sequences with fewer parameters.
d. Memory Footprint Management

All feature preprocessing, including PCA for dimensionality reduction
and correlation-based feature selection, reduces memory consumption by
eliminating irrelevant or redundant features before training.
Table 3.13 Model Inference Time and Memory Comparison

Module
Inference
Time (s)
Memory Usage
(MB approx.)
Optimizations
Applied
Decision
Tree–KNN
8.00 ~200 Leaf node refinement,
pre-pruning
DBN 6.80 ~300 PCA, CD-1 pretraining
GRU 12.60 ~350 Sequence truncation,
dropout
CNN
(baseline)
385.24 >800 None (benchmark
model)

Together, these techniques ensure that the full architecture remains
deployable on mid-range mobile hardware while maintaining real-time
capabilities.

117

3.3.3 Scalability Across Datasets and Hardware Platforms

The DBN-GRU hybrid architecture is inherently scalable and has
been validated using the Drebin dataset consisting of 129,013 apps,
including 5,560 malware samples. This demonstrates that the model can
handle large-scale input datasets with both class imbalance and high
dimensionality.
a. Dataset Scalability

The design guarantees stability even during extended training
sessions by using batch normalization, mini-batch training (batch size = 32),
and early termination. Faster convergence and flexibility to class imbalance
are made possible by the use of categorical cross-entropy loss and the Adam
optimizer, which guarantees efficient learning from a variety of malware
types and safe applications.
b. Platform Scalability

The system was developed with edge compatibility in mind, but it
has also been tested on high-performance configurations (such as an Intel
i7 CPU, 32GB RAM, and an NVIDIA RTX 3060 GPU). The DBN's short
feature vector size (post-PCA) and the shallow design of the GRU model
allow them to be adapted to hardware accelerators like:
● Android Neural Networks API (NNAPI),

● TensorFlow Lite (TFLite),

● Qualcomm Hexagon DSP.

118

With minor adjustments like model pruning, quantization-aware
training, and float16 conversion, the complete DBN-GRU architecture can
be compressed to meet mobile inference constraints. This makes the system
scalable across:
● On-device analysis (Android smartphones),

● Cloud inference (e.g., Firebase ML, AWS SageMaker),

● Enterprise-grade SOCs (System on Chips).

3.3.4 Support for Explainability (XAI Integration)

For AI use in cybersecurity, transparency and explainability are
significant, especially for applications that have regulating agencies like
government infrastructure, healthcare, and finance. There are multiple
points of integration for XAI in the proposed system, although the DBN and
GRU models are effectively black-boxed networks.
a. Static Pipeline (DBN)

SHAP (SHapley Additive exPlanations) or LIME (Loc al
Interpretable Model-agnostic Explanations) can be used to show feature
importance for the DBN component. It is possible to assign malware
decisions to particular characteristics, such as "SMS_SEND" or
"BOOT_COMPLETED," with high interpretability because the DBN is fed
a condensed set of static features (such as permissions and intents).

119

b. Dynamic Pipeline (GRU)

As an improvement, temporal attention mechanisms are facilitated by
the GRU architecture. The time steps that most heavily influence
identifying the app as malicious, e.g., the network activity burst at t=43
seconds, can be highlighted using attention weights. This enables analysts
to detect unusual behavior in execution logs.
c. Fusion Layer Insight

The feature representation fi=vDBN⊕hGRU can be analyzed with post-
hoc explanation models (e.g., DeepLIFT or Integrated Gradients), pointing
out the relative contribution of each modality (dynamic or static) to the final
choice.
d. Visualizations for Analysts

A dashboard interface could be integrated for visualizing:

● Feature contributions (bar plots for permissions/API usage),

● Time-step level activity (heatmaps for GRU hidden states),

● Anomaly scores (for triggered behaviors like system calls).

This transparency enhances user trust, supports compliance with
cybersecurity regulations, and facilitates manual malware auditing in
enterprise settings.

120

In addition to excellent performance (98.7% accuracy, 98.9% recall,
and 0.99 AUC), the hybrid malware system introduced here also stands out
in terms of design rationales with respect to architecture and deployment.
The system's modular structure lends itself to flexibility and upgradability.
It is capable of detection while minimizing latency and memory
consumption. Additionally, the architecture has explainability modules to
maximize transparency and trust, and is scalable across both hardware
platforms and real-world datasets. Together, these advantages support its
viability as an operational, real-time malware detection application for
serious digital infrastructures and mobile security ecosystems.
Tags