Skip to content

Architecture

Agent: Implemented RL algorithms using RLLTE modules.

Type Algo. Box Dis. M.B. M.D. M.P. NPU 💰 🔭
On-Policy A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy PPO ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DrAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DAAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DrDAAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy PPG ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
Off-Policy DQN ✔️ ✔️ ✔️ ✔️
Off-Policy DDPG ✔️ ✔️ ✔️ ✔️
Off-Policy SAC ✔️ ✔️ ✔️ ✔️
Off-Policy SAC-Discrete ✔️ ✔️ ✔️ ✔️
Off-Policy TD3 ✔️ ✔️ ✔️ ✔️
Off-Policy DrQ-v2 ✔️ ✔️ ✔️ ✔️
Distributed IMPALA ✔️ ✔️ ✔️
  • Dis., M.B., M.D.: Discrete, MultiBinary, and MultiDiscrete action space;
  • M.P.: Multi processing;
  • 🐌: Developing;
  • 💰: Support intrinsic reward shaping;
  • 🔭: Support observation augmentation.

Xploit: Modules that focus on exploitation in RL.

Policy: Policies for interaction and learning.

Module Type Remark
OnPolicySharedActorCritic On-policy Actor-Critic networks with a shared encoder.
OnPolicyDecoupledActorCritic On-policy Actor-Critic networks with two separate encoders.
OffPolicyDoubleQNetwork Off-policy Double Q-network.
OffPolicyDoubleActorDoubleCritic Off-policy Double deterministic actor network and double-critic network.
OffPolicyDetActorDoubleCritic Off-policy Deterministic actor network and double-critic network.
OffPolicyStochActorDoubleCritic Off-policy Stochastic actor network and double-critic network.
DistributedActorLearner Distributed Memory-shared actor and learner networks

Encoder: Neural nework-based encoders for processing observations.

Module Input Reference Target Task
EspeholtResidualEncoder Images Paper Atari or Procgen games
MnihCnnEncoder Images Paper Atari games
TassaCnnEncoder Images Paper DeepMind Control Suite: pixel
PathakCnnEncoder Images Paper Atari or MiniGrid games
IdentityEncoder States N/A DeepMind Control Suite: state
VanillaMlpEncoder States N/A DeepMind Control Suite: state
RaffinCombinedEncoder Dict Paper Highway
  • Naming Rule: Surname of the first author + Backbone + Encoder
  • Target Task: The testing tasks in their paper or potential tasks.

Storage: Experience storage and sampling.

Module Type Remark
VanillaRolloutStorage On-policy
DictRolloutStorage On-policy
VanillaReplayStorage Off-policy
DictReplayStorage Off-policy
NStepReplayStorage Off-policy
PrioritizedReplayStorage Off-policy
HerReplayStorage Off-policy
VanillaDistributedStorage Distributed

Xplore: Modules that focus on exploration in RL.

Augmentation: PyTorch.nn-like modules for observation augmentation.

Module Input Reference
GaussianNoise States Paper
RandomAmplitudeScaling States Paper
GrayScale Images Paper
RandomColorJitter Images Paper
RandomConvolution Images Paper
RandomCrop Images Paper
RandomCutout Images Paper
RandomCutoutColor Images Paper
RandomFlip Images Paper
RandomRotate Images Paper
RandomShift Images Paper
RandomTranslate Images Paper

Distribution: Distributions for sampling actions.

Module Type Reference
NormalNoise Noise Paper
OrnsteinUhlenbeckNoise Noise Paper
TruncatedNormalNoise Noise Paper
Bernoulli Distribution Paper
Categorical Distribution Paper
MultiCategorical Distribution Paper
DiagonalGaussian Distribution Paper
SquashedNormal Distribution Paper
  • In RLLTE, the action noise is implemented via a Distribution manner to realize unification.

Reward: Intrinsic reward modules for enhancing exploration.

Type Modules
Count-based PseudoCounts, RND
Curiosity-driven ICM, GIRM, RIDE
Memory-based NGU
Information theory-based RE3, RISE, REVD

See Tutorials: Use Intrinsic Reward and Observation Augmentation for usage examples.


Env: Packaged environments (e.g., Atari games) for fast invocation.

Function Name Remark Reference
make_atari_env Atari Games Discrete control Paper
make_bullet_env PyBullet Robotics Environments Continuous control Paper
make_dmc_env DeepMind Control Suite Continuous control Paper
make_minigrid_env MiniGrid Games Discrete control Paper
make_procgen_env Procgen Games Discrete control Paper
make_robosuite_env Robosuite Robotics Environments Continuous control Paper

Copilot: Large language model-empowered copilot.

See Copilot.


Hub: Fast training APIs and reusable benchmarks.

See Benchmarks.


Evaluation: Reasonable and reliable metrics for algorithm evaluation.

See Tutorials: Model Evaluation.


Pre-training: Methods of pre-training in RL.

See Tutorials: Pre-training.


Deployment: Convenient APIs for model deployment.

See Tutorials: Model Deployment.