3.2 Agent: Deep Q-Network (DQN)
A Deep Q-Network (DQN) is employed as the RL agent. The DQN consists
of:
State Space: The state space, S, is defined as [measured eye
height, measured eye width, PVT parameters (voltage,
temperature), time step].
Action Space: The action space, A, represents the available phase
adjustment steps (+0.5ps, -0.5ps, No Change).
Q-Network: A Convolutional Neural Network (CNN) approximates
the Q-function, Q(s, a), which estimates the expected future
reward for taking action 'a' in state 's'. CNN architecture includes 3
convolutional layers (kernel size 3x3, ReLU activation), followed by
2 fully connected layers.
Experience Replay Buffer: Stores past experiences (s, a, r, s') for
off-policy learning.
Target Network: A periodically updated copy of the Q-Network
used to stabilize learning.
3.3 Reward Function:
The reward function, R(s, a), guides the RL agent towards optimal phase
calibration:
R(s, a) = α * ΔEyeHeight + β * ΔEyeWidth - γ * AbsolutePhaseAdjustment
Where:
ΔEyeHeight = Eye height post-adjustment – Eye height pre-
adjustment.
ΔEyeWidth = Eye width post-adjustment – Eye width pre-
adjustment.
AbsolutePhaseAdjustment = Absolute value of the phase
adjustment applied.
α, β, and γ are weighting parameters (α=0.6, β=0.4, γ=0.1) tuned to
balance eye improvement and phase adjustment stability,
preventing excessive oscillations.
4. Experimental Design and Data Analysis
The training process occurs over 1,000 epochs, with each epoch
consisting of 10,000 simulation steps. The RL agent interacts with the
environment, receiving rewards and updating its Q-Network weights
•
•
•
•
•
•
•
•
•