Multi Agent Reinforcement Learning .pptx

amanbolt88 0 views 19 slides Oct 12, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

About Multi agent reinforcement Learning


Slide Content

Multi-Agent Reinforcement Learning: An Overview IE-690 Reinforcement Learning & Control Pragathi Jha Aman Pandey Sanjeev Kumar ​ Akshdeep Singh Ahluwalia

Single Agent RL 1 Agent : Makes decisions and takes actions (A t ) to maximize rewards. Environment : Provides states (S t ) and rewards (R t ) based on the agent's actions. State (S t​ ) : Represents the current condition of the environment. Action (A t ​) : Decisions made by the agent to influence the environment. Reward (R t ) : Feedback from the environment to evaluate actions. State Transition : The environment updates to a new state (S t+1 ​) after the agent’s action. Deep Mind’s Atari Game : Paper

Extensive Form Game: E.g. Matching Pennies Normal Form Game: E.g. Prisoner’s Dilema P2 P1 Cooperate Defect Cooperate -1, -1 -5, 0 Defect 0, -5 -2, -2 Game Theory: Background Game theory provides the tools to analyze and predict agent behaviors in MARL Extensive form games use decision trees to model sequential decision-making. Nodes represent states, and edges represent actions taken by players in turns. Matching Pennies demonstrates sequential decision-making with alternating player moves. Normal form games use a matrix to represent simultaneous decisions by players. Each cell shows payoffs based on players' chosen strategies. The Prisoner’s Dilemma illustrates how simultaneous decisions impact individual and collective payoffs.

What is MARL ? Multi-Agent Path Finding: Github link

5   Applications: ​ ​ Traffic Management Systems​​ Autonomous Vehicles​​ Swarm Robotics​ - Search & Rescue operations and Drone deliveries​ Multiplayer games – Dota 2 & StarCraft​​ Optimizing energy distribution in smart grids.​​ Multi-agent bidding and trading in financial markets and auctions. Applications of MARL

6 Characteristics Description Interaction Dynamics Agents may collaborate, compete, or operate in mixed settings based on the problem. Non-stationarity The environment dynamics change as agents simultaneously adapt and learn. Credit Assignment Difficulty in attributing individual contributions to team rewards (Global reward structure) Partial Observability Agents often have limited access to the global state, relying on local observations Complexity and Scalability As the number of agents increases, the computational and algorithmic complexity grows exponentially. Centrality Combines centralized training (global information sharing) with decentralized execution (local actions). Agent Types Homogeneous & Heterogeneous agents which differ in architecture, capabilities, roles, or objectives Characteristics of MARL

Problem Representations in MARL ​ partial full colab comp colab comp

Problem Representations in MARL​ . In partial observability, agents have access to only local information, which adds complexity to decision-making. In full observability, all agents have complete access to the global state of the environment

Problem Representations in MARL ​ Agents can either cooperate to achieve a common goal, compete to maximize individual rewards, or operate in mixed scenarios involving both collaboration and competition.

Problem Representations in MARL ​ Unlike single-agent RL, MARL introduces non-stationarity, where the environment evolves due to the simultaneous learning and decision-making of multiple agents.

Frameworks in MARL ​

Frameworks in MARL: MDP ​ Markov Decision Process Single-agent learning. One agent interacts with the environment. At each step, the agent observes the state S, takes an action A, and receives a reward R. Goal: Maximize cumulative rewards over time. , where: S: A finite set of states A: A finite set of actions P : A transition probability function P(s’ | s, a) that defines the probability of transitioning to state s' from state s when action a is taken. R: A reward function R(s, a) that gives the immediate reward for taking action a in state s. γ : A discount factor (0 ≤ γ ≤ 1) that represents the degree to which future rewards are valued over immediate rewards.  

Frameworks in MARL: Markov Game ​ Markov Game Multi-agent generalization of MDP. Multiple agents interact with the environment and each other. Agents observe individual states S 1 , S 2 ,…take actions A1, A 2 ,…and receive rewards R 1 , R 2 … Rewards can be cooperative, competitive, or mixed. , where S: A finite set of states A i : A set of action s paces for each agent i for N players. P i : A transition probability function P i (s’ | s, a 1 ,a 2 ,...)that defines the probability of transitioning to state s' from state s when action a i is taken by agent i . R i : A reward function R(s, a 1 , a 2 ,… a N ) that gives the immediate reward for taking joint action a 1 , a 2 ,… a N in state s. γ : A discount factor (0 ≤ γ ≤ 1) that represents the degree to which future rewards are valued over immediate rewards.  

Frameworks in MARL: POMG ​ Partially Observable Markov Game Markov Game with partial observability. Agents receive private observations O 1 ,O 2 ,… rather than full states. Actions A 1 ,A 2 ,… are taken based on partial knowledge. Rewards R 1 ,R 2 ,… depend on the hidden global state. , where S: A finite set of states A i : A set of action s paces for each agent i for N players. P i : A transition probability function P i (s’ | s, a 1 ,a 2 ,...); probability of transitioning to state s' from state s when action a i is taken by agent i . R i : A reward function R(s, a 1 , a 2 ,… a N ) for taking joint action a 1 , a 2 ,… a N in state s. O i : A set of observation/perception s paces for each agent i for N players. γ : A discount factor (0 ≤ γ ≤ 1) that represents the degree to which future rewards are valued over immediate rewards.  

Frameworks in MARL: Dec-POMG ​ Decentralized Partially Observable Markov Game Cooperative POMG. Multiple agents receive individual observations O 1 ,O 2 ,… Agents act independently to optimize a shared reward R. Coordination is crucial since agents don’t have global information. , where S: A finite set of states A i : A set of action s paces for each agent i for N players. P i : A transition probability function P i (s’ | s, a 1 ,a 2 ,...); probability of transitioning to state s' from state s when action a i is taken by agent i . R i : A reward function R(s, a 1 , a 2 ,… a N ) for taking joint action a 1 , a 2 ,… a N in state s. O i : A set of observation/perception s paces for each agent i for N players. γ : A discount factor (0 ≤ γ ≤ 1) that represents the degree to which future rewards are valued over immediate rewards. : Set of policies for each agent i , which defines the strategy for each agent.  

Frameworks in MARL: Extensive Form Game ​ Extensive Form Game Tree-based representation of agent interactions. Agents take actions sequentially, leading to different branches of the game tree. Each agent receives a reward R 1 , R 2 ,… based on the branch. Ideal for turn-based strategic games. , where N: Set of players. H i : S et of history for each player i for N players. P i : A transition probability function P i (s’ | s, a 1 ,a 2 ,...); probability of transitioning to state s' from state s when action a i is taken by agent i . R i : A reward function R(s, a 1 , a 2 ,… a N ) for taking joint action a 1 , a 2 ,… a N in state s. γ : A discount factor (0 ≤ γ ≤ 1) that represents the degree to which future rewards are valued over immediate rewards.  

Frameworks in MARL ​ Add agents Partial Observability Cooperative format Sequential game with histories over information sets

Challenges and Solutions ​ Frameworks in MARL ​ Information sharing Reward Sharing

Frameworks in MARL ​ Fin!