Architecture
Agent: Implemented RL algorithms using RLLTE modules.
Type | Algo. | Box | Dis. |
M.B. |
M.D. |
M.P. |
NPU | 💰 | 🔭 |
---|---|---|---|---|---|---|---|---|---|
On-Policy | A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
On-Policy | PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
On-Policy | DrAC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
On-Policy | DAAC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
On-Policy | DrDAAC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
On-Policy | PPG | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | DQN | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | DDPG | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | SAC | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | SAC-Discrete | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | TD3 | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
Off-Policy | DrQ-v2 | ✔️ | ❌ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ |
Distributed | IMPALA | ✔️ | ✔️ | ❌ | ❌ | ✔️ | ❌ | ❌ | ❌ |
Dis., M.B., M.D.
:Discrete
,MultiBinary
, andMultiDiscrete
action space;M.P.
: Multi processing;- 🐌: Developing;
- 💰: Support intrinsic reward shaping;
- 🔭: Support observation augmentation.
Xploit: Modules that focus on exploitation in RL.
Policy: Policies for interaction and learning.
Module | Type | Remark |
---|---|---|
OnPolicySharedActorCritic | On-policy | Actor-Critic networks with a shared encoder. |
OnPolicyDecoupledActorCritic | On-policy | Actor-Critic networks with two separate encoders. |
OffPolicyDoubleQNetwork | Off-policy | Double Q-network. |
OffPolicyDoubleActorDoubleCritic | Off-policy | Double deterministic actor network and double-critic network. |
OffPolicyDetActorDoubleCritic | Off-policy | Deterministic actor network and double-critic network. |
OffPolicyStochActorDoubleCritic | Off-policy | Stochastic actor network and double-critic network. |
DistributedActorLearner | Distributed | Memory-shared actor and learner networks |
Encoder: Neural nework-based encoders for processing observations.
Module | Input | Reference | Target Task |
---|---|---|---|
EspeholtResidualEncoder | Images | Paper | Atari or Procgen games |
MnihCnnEncoder | Images | Paper | Atari games |
TassaCnnEncoder | Images | Paper | DeepMind Control Suite: pixel |
PathakCnnEncoder | Images | Paper | Atari or MiniGrid games |
IdentityEncoder | States | N/A | DeepMind Control Suite: state |
VanillaMlpEncoder | States | N/A | DeepMind Control Suite: state |
RaffinCombinedEncoder | Dict | Paper | Highway |
- Naming Rule:
Surname of the first author
+Backbone
+Encoder
- Target Task: The testing tasks in their paper or potential tasks.
Storage: Experience storage and sampling.
Module | Type | Remark |
---|---|---|
VanillaRolloutStorage | On-policy | |
DictRolloutStorage | On-policy | |
VanillaReplayStorage | Off-policy | |
DictReplayStorage | Off-policy | |
NStepReplayStorage | Off-policy | |
PrioritizedReplayStorage | Off-policy | |
HerReplayStorage | Off-policy | |
VanillaDistributedStorage | Distributed |
Xplore: Modules that focus on exploration in RL.
Augmentation: PyTorch.nn-like modules for observation augmentation.
Module | Input | Reference |
---|---|---|
GaussianNoise | States | Paper |
RandomAmplitudeScaling | States | Paper |
GrayScale | Images | Paper |
RandomColorJitter | Images | Paper |
RandomConvolution | Images | Paper |
RandomCrop | Images | Paper |
RandomCutout | Images | Paper |
RandomCutoutColor | Images | Paper |
RandomFlip | Images | Paper |
RandomRotate | Images | Paper |
RandomShift | Images | Paper |
RandomTranslate | Images | Paper |
Distribution: Distributions for sampling actions.
Module | Type | Reference |
---|---|---|
NormalNoise | Noise | Paper |
OrnsteinUhlenbeckNoise | Noise | Paper |
TruncatedNormalNoise | Noise | Paper |
Bernoulli | Distribution | Paper |
Categorical | Distribution | Paper |
MultiCategorical | Distribution | Paper |
DiagonalGaussian | Distribution | Paper |
SquashedNormal | Distribution | Paper |
- In RLLTE, the action noise is implemented via a
Distribution
manner to realize unification.
Reward: Intrinsic reward modules for enhancing exploration.
Type | Modules |
---|---|
Count-based | PseudoCounts, RND |
Curiosity-driven | ICM, GIRM, RIDE |
Memory-based | NGU |
Information theory-based | RE3, RISE, REVD |
See Tutorials: Use Intrinsic Reward and Observation Augmentation for usage examples.
Env: Packaged environments (e.g., Atari games) for fast invocation.
Function | Name | Remark | Reference |
---|---|---|---|
make_atari_env | Atari Games | Discrete control | Paper |
make_bullet_env | PyBullet Robotics Environments | Continuous control | Paper |
make_dmc_env | DeepMind Control Suite | Continuous control | Paper |
make_minigrid_env | MiniGrid Games | Discrete control | Paper |
make_procgen_env | Procgen Games | Discrete control | Paper |
make_robosuite_env | Robosuite Robotics Environments | Continuous control | Paper |
Copilot: Large language model-empowered copilot.
See Copilot.
Hub: Fast training APIs and reusable benchmarks.
See Benchmarks.
Evaluation: Reasonable and reliable metrics for algorithm evaluation.
See Tutorials: Model Evaluation.