DistributedActorLearner
DistributedActorLearner(
observation_space: gym.Space, action_space: gym.Space, feature_dim: int,
hidden_dim: int = 512, opt_class: Type[th.optim.Optimizer] = th.optim.Adam,
opt_kwargs: Optional[Dict[str, Any]] = None, init_fn: str = 'orthogonal',
use_lstm: bool = False
)
Actor-Learner network for IMPALA.
Args
- observation_space (gym.Space) : Observation space.
- action_space (gym.Space) : Action space.
- feature_dim (int) : Number of features accepted.
- hidden_dim (int) : Number of units per hidden layer.
- opt_class (Type[th.optim.Optimizer]) : Optimizer class.
- opt_kwargs (Dict[str, Any]) : Optimizer keyword arguments.
- init_fn (str) : Parameters initialization method.
- use_lstm (bool) : Whether to use LSTM module.
Returns
Actor-Critic network.
Methods:
.describe
Describe the policy.
.explore
Explore the environment and randomly generate actions.
Args
- obs (th.Tensor) : Observation from the environment.
Returns
Sampled actions.
.freeze
Freeze all the elements like encoder
and dist
.
Args
- encoder (nn.Module) : Encoder network.
- dist (Distribution) : Distribution.
Returns
None.
.forward
Only for inference.
.to
Only move the learner to device, and keep actor in CPU.
Args
- device (th.device) : Device to use.
Returns
None.
.save
Save models.
Args
- path (Path) : Save path.
- pretraining (bool) : Pre-training mode.
- global_step (int) : Global training step.
Returns
None.
.load
Load initial parameters.
Args
- path (str) : Import path.
- device (th.device) : Device to use.
Returns
None.