harl.runners package¶

Submodules¶

harl.runners.off_policy_base_runner module¶

Base runner for off-policy algorithms.

class harl.runners.off_policy_base_runner.OffPolicyBaseRunner(args, algo_args, env_args)¶

Bases: object

Base runner for off-policy algorithms.

close()¶: Close environment, writter, and log file.

eval(step)¶: Evaluate the model

get_actions(obs, available_actions=None, add_random=True)¶

Get actions for rollout.

Parameters:

obs – (np.ndarray) input observation, shape is (n_threads, n_agents, dim)
available_actions – (np.ndarray) denotes which actions are available to agent (if None, all actions available), shape is (n_threads, n_agents, action_number) or (n_threads, ) of None
add_random – (bool) whether to add randomness

Returns:

(np.ndarray) agent actions, shape is (n_threads, n_agents, dim)

Return type:

actions

insert(data)¶

render()¶: Render the model

restore()¶: Restore the model

run()¶: Run the training (or rendering) pipeline.

sample_actions(available_actions=None)¶

Sample random actions for warmup.

Parameters:: available_actions – (np.ndarray) denotes which actions are available to agent (if None, all actions available), shape is (n_threads, n_agents, action_number) or (n_threads, ) of None
Returns:: (np.ndarray) sampled actions, shape is (n_threads, n_agents, dim)
Return type:: actions

save()¶: Save the model

train()¶: Train the model

warmup()¶: Warmup the replay buffer with random actions

harl.runners.off_policy_ha_runner module¶

Runner for off-policy HARL algorithms.

class harl.runners.off_policy_ha_runner.OffPolicyHARunner(args, algo_args, env_args)[source]¶

Bases: OffPolicyBaseRunner

Runner for off-policy HA algorithms.

train()[source]¶: Train the model

harl.runners.off_policy_ma_runner module¶

Runner for off-policy MA algorithms

class harl.runners.off_policy_ma_runner.OffPolicyMARunner(args, algo_args, env_args)[source]¶

Bases: OffPolicyBaseRunner

Runner for off-policy MA algorithms.

train()[source]¶: Train the model

harl.runners.on_policy_base_runner module¶

Base runner for on-policy algorithms.

class harl.runners.on_policy_base_runner.OnPolicyBaseRunner(args, algo_args, env_args)¶

Bases: object

Base runner for on-policy algorithms.

after_update()¶: Do the necessary data operations after an update. After an update, copy the data at the last step to the first position of the buffer. This will be used for then generating new actions.

close()¶: Close environment, writter, and logger.

collect(step)¶

Collect actions and values from actors and critics. :param step: step in the episode.

Returns:: values, actions, action_log_probs, rnn_states, rnn_states_critic

compute()¶: Compute returns and advantages. Compute critic evaluation of the last state, and then let buffer compute returns, which will be used during training.

dump_metrics_to_csv(metrics, eval_episode)¶: Dump collected metrics to a CSV file for all agents in one go.

eval()¶: Evaluate the model.

insert(data)¶: Insert data into buffer.

prep_rollout()¶: Prepare for rollout.

prep_training()¶: Prepare for training.

render()¶: Render the model.

restore()¶: Restore model parameters.

run()¶: Run the training (or rendering) pipeline.

save()¶: Save model parameters.

train()¶: Train the model.

warmup()¶: Warm up the replay buffer.

harl.runners.on_policy_ha_runner module¶

Runner for on-policy HARL algorithms.

class harl.runners.on_policy_ha_runner.OnPolicyHARunner(args, algo_args, env_args)[source]¶

Bases: OnPolicyBaseRunner

Runner for on-policy HA algorithms.

train()[source]¶: Train the model.

harl.runners.on_policy_ma_runner module¶

Runner for on-policy MA algorithms.

class harl.runners.on_policy_ma_runner.OnPolicyMARunner(args, algo_args, env_args)[source]¶

Bases: OnPolicyBaseRunner

Runner for on-policy MA algorithms.

train()[source]¶: Training procedure for MAPPO.

Module contents¶

Runner registry.