harl.models.policy_models package¶
Submodules¶
harl.models.policy_models.deterministic_policy module¶
- class harl.models.policy_models.deterministic_policy.DeterministicPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]¶
Bases:
Module
Deterministic policy network for continuous action space.
- forward(obs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
harl.models.policy_models.squashed_gaussian_policy module¶
- class harl.models.policy_models.squashed_gaussian_policy.SquashedGaussianPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]¶
Bases:
Module
Squashed Gaussian policy network for HASAC.
- forward(obs, stochastic=True, with_logprob=True)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
harl.models.policy_models.stochastic_mlp_policy module¶
- class harl.models.policy_models.stochastic_mlp_policy.StochasticMlpPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]¶
Bases:
Module
Stochastic policy model that only uses MLP network. Outputs actions given observations.
- forward(obs, available_actions=None, stochastic=True)[source]¶
Compute actions from the given inputs.
- Parameters:
obs – (np.ndarray / torch.Tensor) observation inputs into network.
available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)
stochastic – (bool) whether to sample from action distribution or return the mode.
- Returns:
(torch.Tensor) actions to take.
- Return type:
actions
- get_logits(obs, available_actions=None)[source]¶
Get action logits from the given inputs.
- Parameters:
obs – (np.ndarray / torch.Tensor) input to network.
available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)
- Returns:
(torch.Tensor) logits of actions for the given inputs.
- Return type:
action_logits
harl.models.policy_models.stochastic_policy module¶
- class harl.models.policy_models.stochastic_policy.StochasticPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]¶
Bases:
Module
Stochastic policy model. Outputs actions given observations.
- evaluate_actions(obs, rnn_states, action, masks, available_actions=None, active_masks=None)[source]¶
Compute action log probability, distribution entropy, and action distribution.
- Parameters:
obs – (np.ndarray / torch.Tensor) observation inputs into network.
rnn_states – (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.
action – (np.ndarray / torch.Tensor) actions whose entropy and log probability to evaluate.
masks – (np.ndarray / torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.
available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)
active_masks – (np.ndarray / torch.Tensor) denotes whether an agent is active or dead.
- Returns:
(torch.Tensor) log probabilities of the input actions. dist_entropy: (torch.Tensor) action distribution entropy for the given inputs. action_distribution: (torch.distributions) action distribution.
- Return type:
action_log_probs
- forward(obs, rnn_states, masks, available_actions=None, deterministic=False)[source]¶
Compute actions from the given inputs.
- Parameters:
obs – (np.ndarray / torch.Tensor) observation inputs into network.
rnn_states – (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.
masks – (np.ndarray / torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.
available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)
deterministic – (bool) whether to sample from action distribution or return the mode.
- Returns:
(torch.Tensor) actions to take. action_log_probs: (torch.Tensor) log probabilities of taken actions. rnn_states: (torch.Tensor) updated RNN hidden states.
- Return type:
actions