harl.models.policy_models package

Submodules

harl.models.policy_models.deterministic_policy module

class harl.models.policy_models.deterministic_policy.DeterministicPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]

Bases: Module

Deterministic policy network for continuous action space.

forward(obs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

harl.models.policy_models.squashed_gaussian_policy module

class harl.models.policy_models.squashed_gaussian_policy.SquashedGaussianPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]

Bases: Module

Squashed Gaussian policy network for HASAC.

forward(obs, stochastic=True, with_logprob=True)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

harl.models.policy_models.stochastic_mlp_policy module

class harl.models.policy_models.stochastic_mlp_policy.StochasticMlpPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]

Bases: Module

Stochastic policy model that only uses MLP network. Outputs actions given observations.

forward(obs, available_actions=None, stochastic=True)[source]

Compute actions from the given inputs.

Parameters:
  • obs – (np.ndarray / torch.Tensor) observation inputs into network.

  • available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)

  • stochastic – (bool) whether to sample from action distribution or return the mode.

Returns:

(torch.Tensor) actions to take.

Return type:

actions

get_logits(obs, available_actions=None)[source]

Get action logits from the given inputs.

Parameters:
  • obs – (np.ndarray / torch.Tensor) input to network.

  • available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)

Returns:

(torch.Tensor) logits of actions for the given inputs.

Return type:

action_logits

harl.models.policy_models.stochastic_policy module

class harl.models.policy_models.stochastic_policy.StochasticPolicy(args, obs_space, action_space, device=device(type='cpu'))[source]

Bases: Module

Stochastic policy model. Outputs actions given observations.

evaluate_actions(obs, rnn_states, action, masks, available_actions=None, active_masks=None)[source]

Compute action log probability, distribution entropy, and action distribution.

Parameters:
  • obs – (np.ndarray / torch.Tensor) observation inputs into network.

  • rnn_states – (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.

  • action – (np.ndarray / torch.Tensor) actions whose entropy and log probability to evaluate.

  • masks – (np.ndarray / torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.

  • available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)

  • active_masks – (np.ndarray / torch.Tensor) denotes whether an agent is active or dead.

Returns:

(torch.Tensor) log probabilities of the input actions. dist_entropy: (torch.Tensor) action distribution entropy for the given inputs. action_distribution: (torch.distributions) action distribution.

Return type:

action_log_probs

forward(obs, rnn_states, masks, available_actions=None, deterministic=False)[source]

Compute actions from the given inputs.

Parameters:
  • obs – (np.ndarray / torch.Tensor) observation inputs into network.

  • rnn_states – (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.

  • masks – (np.ndarray / torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.

  • available_actions – (np.ndarray / torch.Tensor) denotes which actions are available to agent (if None, all actions available)

  • deterministic – (bool) whether to sample from action distribution or return the mode.

Returns:

(torch.Tensor) actions to take. action_log_probs: (torch.Tensor) log probabilities of taken actions. rnn_states: (torch.Tensor) updated RNN hidden states.

Return type:

actions

Module contents