harl.runners package¶
Submodules¶
harl.runners.off_policy_base_runner module¶
Base runner for off-policy algorithms.
- class harl.runners.off_policy_base_runner.OffPolicyBaseRunner(args, algo_args, env_args)¶
Bases:
object
Base runner for off-policy algorithms.
- close()¶
Close environment, writter, and log file.
- eval(step)¶
Evaluate the model
- get_actions(obs, available_actions=None, add_random=True)¶
Get actions for rollout.
- Parameters:
obs – (np.ndarray) input observation, shape is (n_threads, n_agents, dim)
available_actions – (np.ndarray) denotes which actions are available to agent (if None, all actions available), shape is (n_threads, n_agents, action_number) or (n_threads, ) of None
add_random – (bool) whether to add randomness
- Returns:
(np.ndarray) agent actions, shape is (n_threads, n_agents, dim)
- Return type:
actions
- insert(data)¶
- render()¶
Render the model
- restore()¶
Restore the model
- run()¶
Run the training (or rendering) pipeline.
- sample_actions(available_actions=None)¶
Sample random actions for warmup.
- Parameters:
available_actions – (np.ndarray) denotes which actions are available to agent (if None, all actions available), shape is (n_threads, n_agents, action_number) or (n_threads, ) of None
- Returns:
(np.ndarray) sampled actions, shape is (n_threads, n_agents, dim)
- Return type:
actions
- save()¶
Save the model
- train()¶
Train the model
- warmup()¶
Warmup the replay buffer with random actions
harl.runners.off_policy_ha_runner module¶
Runner for off-policy HARL algorithms.
- class harl.runners.off_policy_ha_runner.OffPolicyHARunner(args, algo_args, env_args)[source]¶
Bases:
OffPolicyBaseRunner
Runner for off-policy HA algorithms.
harl.runners.off_policy_ma_runner module¶
Runner for off-policy MA algorithms
- class harl.runners.off_policy_ma_runner.OffPolicyMARunner(args, algo_args, env_args)[source]¶
Bases:
OffPolicyBaseRunner
Runner for off-policy MA algorithms.
harl.runners.on_policy_base_runner module¶
Base runner for on-policy algorithms.
- class harl.runners.on_policy_base_runner.OnPolicyBaseRunner(args, algo_args, env_args)¶
Bases:
object
Base runner for on-policy algorithms.
- after_update()¶
Do the necessary data operations after an update. After an update, copy the data at the last step to the first position of the buffer. This will be used for then generating new actions.
- close()¶
Close environment, writter, and logger.
- collect(step)¶
Collect actions and values from actors and critics. :param step: step in the episode.
- Returns:
values, actions, action_log_probs, rnn_states, rnn_states_critic
- compute()¶
Compute returns and advantages. Compute critic evaluation of the last state, and then let buffer compute returns, which will be used during training.
- dump_metrics_to_csv(metrics, eval_episode)¶
Dump collected metrics to a CSV file for all agents in one go.
- eval()¶
Evaluate the model.
- insert(data)¶
Insert data into buffer.
- prep_rollout()¶
Prepare for rollout.
- prep_training()¶
Prepare for training.
- render()¶
Render the model.
- restore()¶
Restore model parameters.
- run()¶
Run the training (or rendering) pipeline.
- save()¶
Save model parameters.
- train()¶
Train the model.
- warmup()¶
Warm up the replay buffer.
harl.runners.on_policy_ha_runner module¶
Runner for on-policy HARL algorithms.
- class harl.runners.on_policy_ha_runner.OnPolicyHARunner(args, algo_args, env_args)[source]¶
Bases:
OnPolicyBaseRunner
Runner for on-policy HA algorithms.
harl.runners.on_policy_ma_runner module¶
Runner for on-policy MA algorithms.
- class harl.runners.on_policy_ma_runner.OnPolicyMARunner(args, algo_args, env_args)[source]¶
Bases:
OnPolicyBaseRunner
Runner for on-policy MA algorithms.
Module contents¶
Runner registry.