BaseIntrinsicRewardModule
BaseIntrinsicRewardModule(
observation_space: gym.Space, action_space: gym.Space, device: str = 'cpu',
beta: float = 0.05, kappa: float = 2.5e-05
)
Base class of intrinsic reward module.
Args
- observation_space (gym.Space) : The observation space of environment.
- action_space (gym.Space) : The action space of environment.
- device (str) : Device (cpu, cuda, ...) on which the code should be run.
- beta (float) : The initial weighting coefficient of the intrinsic rewards.
- kappa (float) : The decay rate.
Returns
Instance of the base intrinsic reward module.
Methods:
.compute_irs
Compute the intrinsic rewards for current samples.
Args
- samples (Dict) : The collected samples. A python dict like
{obs (n_steps, n_envs, obs_shape)
, actions (n_steps, n_envs, action_shape), rewards (n_steps, n_envs) , next_obs (n_steps, n_envs, *obs_shape) }. - step (int) : The global training step.
Returns
The intrinsic rewards.
.update
Update the intrinsic reward module if necessary.
Args
- samples : The collected samples. A python dict like
{obs (n_steps, n_envs, obs_shape)
, actions (n_steps, n_envs, action_shape), rewards (n_steps, n_envs) , next_obs (n_steps, n_envs, *obs_shape) }.
Returns
None
.add
Add the samples to the intrinsic reward module if necessary.
User for modules like RE3
that have a storage component.
Args
- samples : The collected samples. A python dict like
{obs (n_steps, n_envs, obs_shape)
, actions (n_steps, n_envs, action_shape), rewards (n_steps, n_envs) , next_obs (n_steps, n_envs, *obs_shape) }.
Returns
None