ES EN DE
UPO
Erasmus+ KA3 - European Policy Experimentation
EU

Reinforcement Learning Simulation

Train and optimize personalised learning pathways using virtual student clones and reinforcement learning algorithms.

Simulation Status
Development

Technical Preview

RL Architecture Overview
State (S)

Student profile variables

Action (A)

Learning activity selection

Reward (R)

Learning outcome signal

State Space Components
Demographic variables (V1-V4)
Emotional factors (V5-V6)
Cognitive factors (V7-V12)
Current performance (V16-V20)
Simulation Console

> Initializing InfiniteLearner RL Environment...

> Loading student profiles from database...

> Students loaded: 3 virtual clones

> Activity space: 48 learning activities

> Difficulty levels: [Basic, Elementary, Intermediate, Advanced]

>

> Starting training episode 1/1000...

> Agent: PPO (Proximal Policy Optimization)

> Learning rate: 0.0003

>

> Episode 1: Reward = 0.45 | Steps = 12

> Episode 50: Reward = 0.68 | Steps = 10

> Episode 100: Reward = 0.82 | Steps = 8

> Episode 500: Reward = 0.91 | Steps = 7

> Training in progress... _

Technology Stack
Python
PyTorch
Gymnasium
Stable-Baselines3
PostgreSQL
Flask API
Virtual Student Clones
MG
Maria (Clone)

High performer profile

PR
Pablo (Clone)

Average performer profile

AM
Ana (Clone)

Needs support profile

Training Metrics
Average Reward 0.91
Policy Loss 0.023
Episodes Complete 512/1000

Explore the Source Code

The RL simulation module is open source and available for research purposes

View on GitHub