Architecture

Agent: Implemented RL algorithms using RLLTE modules.

Type	Algo.	Box	`Dis.`	`M.B.`	`M.D.`	`M.P.`	NPU	💰	🔭
On-Policy	A2C	✔️	✔️	✔️	✔️	✔️	✔️	✔️	❌
On-Policy	PPO	✔️	✔️	✔️	✔️	✔️	✔️	✔️	❌
On-Policy	DrAC	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
On-Policy	DAAC	✔️	✔️	✔️	✔️	✔️	✔️	✔️	❌
On-Policy	DrDAAC	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
On-Policy	PPG	✔️	✔️	✔️	❌	✔️	✔️	✔️	❌
Off-Policy	DQN	✔️	❌	❌	❌	✔️	✔️	✔️	❌
Off-Policy	DDPG	✔️	❌	❌	❌	✔️	✔️	✔️	❌
Off-Policy	SAC	✔️	❌	❌	❌	✔️	✔️	✔️	❌
Off-Policy	SAC-Discrete	❌	✔️	❌	❌	✔️	✔️	✔️	❌
Off-Policy	TD3	✔️	❌	❌	❌	✔️	✔️	✔️	❌
Off-Policy	DrQ-v2	✔️	❌	❌	❌	❌	✔️	✔️	✔️
Distributed	IMPALA	✔️	✔️	❌	❌	✔️	❌	❌	❌

Dis., M.B., M.D.: Discrete, MultiBinary, and MultiDiscrete action space;

M.P.: Multi processing;

🐌: Developing;

💰: Support intrinsic reward shaping;

🔭: Support observation augmentation.

Xploit: Modules that focus on exploitation in RL.

Policy: Policies for interaction and learning.

Module	Type	Remark
OnPolicySharedActorCritic	On-policy	Actor-Critic networks with a shared encoder.
OnPolicyDecoupledActorCritic	On-policy	Actor-Critic networks with two separate encoders.
OffPolicyDoubleQNetwork	Off-policy	Double Q-network.
OffPolicyDoubleActorDoubleCritic	Off-policy	Double deterministic actor network and double-critic network.
OffPolicyDetActorDoubleCritic	Off-policy	Deterministic actor network and double-critic network.
OffPolicyStochActorDoubleCritic	Off-policy	Stochastic actor network and double-critic network.
DistributedActorLearner	Distributed	Memory-shared actor and learner networks

Encoder: Neural nework-based encoders for processing observations.

Module	Input	Reference	Target Task
EspeholtResidualEncoder	Images	Paper	Atari or Procgen games
MnihCnnEncoder	Images	Paper	Atari games
TassaCnnEncoder	Images	Paper	DeepMind Control Suite: pixel
PathakCnnEncoder	Images	Paper	Atari or MiniGrid games
IdentityEncoder	States	N/A	DeepMind Control Suite: state
VanillaMlpEncoder	States	N/A	DeepMind Control Suite: state
RaffinCombinedEncoder	Dict	Paper	Highway

Naming Rule: Surname of the first author + Backbone + Encoder

Target Task: The testing tasks in their paper or potential tasks.

Storage: Experience storage and sampling.

Module	Type	Remark
VanillaRolloutStorage	On-policy
DictRolloutStorage	On-policy
VanillaReplayStorage	Off-policy
DictReplayStorage	Off-policy
NStepReplayStorage	Off-policy
PrioritizedReplayStorage	Off-policy
HerReplayStorage	Off-policy
VanillaDistributedStorage	Distributed

Xplore: Modules that focus on exploration in RL.

Augmentation: PyTorch.nn-like modules for observation augmentation.

Module	Input	Reference
GaussianNoise	States	Paper
RandomAmplitudeScaling	States	Paper
GrayScale	Images	Paper
RandomColorJitter	Images	Paper
RandomConvolution	Images	Paper
RandomCrop	Images	Paper
RandomCutout	Images	Paper
RandomCutoutColor	Images	Paper
RandomFlip	Images	Paper
RandomRotate	Images	Paper
RandomShift	Images	Paper
RandomTranslate	Images	Paper

Distribution: Distributions for sampling actions.

Module	Type	Reference
NormalNoise	Noise	Paper
OrnsteinUhlenbeckNoise	Noise	Paper
TruncatedNormalNoise	Noise	Paper
Bernoulli	Distribution	Paper
Categorical	Distribution	Paper
MultiCategorical	Distribution	Paper
DiagonalGaussian	Distribution	Paper
SquashedNormal	Distribution	Paper

In RLLTE, the action noise is implemented via a Distribution manner to realize unification.

Reward: Intrinsic reward modules for enhancing exploration.

Type	Modules
Count-based	PseudoCounts, RND
Curiosity-driven	ICM, GIRM, RIDE
Memory-based	NGU
Information theory-based	RE3, RISE, REVD

See Tutorials: Use Intrinsic Reward and Observation Augmentation for usage examples.

Env: Packaged environments (e.g., Atari games) for fast invocation.

Function	Name	Remark	Reference
make_atari_env	Atari Games	Discrete control	Paper
make_bullet_env	PyBullet Robotics Environments	Continuous control	Paper
make_dmc_env	DeepMind Control Suite	Continuous control	Paper
make_minigrid_env	MiniGrid Games	Discrete control	Paper
make_procgen_env	Procgen Games	Discrete control	Paper
make_robosuite_env	Robosuite Robotics Environments	Continuous control	Paper

Architecture

Agent: Implemented RL algorithms using RLLTE modules.

Xploit: Modules that focus on exploitation in RL.

Xplore: Modules that focus on exploration in RL.

Env: Packaged environments (e.g., Atari games) for fast invocation.

Copilot: Large language model-empowered copilot.

Hub: Fast training APIs and reusable benchmarks.

Evaluation: Reasonable and reliable metrics for algorithm evaluation.

Pre-training: Methods of pre-training in RL.

Deployment: Convenient APIs for model deployment.