harl.utils package

Submodules

harl.utils.configs_tools module

Tools for loading and updating configs.

harl.utils.configs_tools.convert_json(obj)[source]

Convert obj to a version which can be serialized with JSON.

harl.utils.configs_tools.get_defaults_yaml_args(algo, env)[source]

Load config file for user-specified algo and env. :param algo: (str) Algorithm name. :param env: (str) Environment name.

Returns:

(dict) Algorithm config. env_args: (dict) Environment config.

Return type:

algo_args

harl.utils.configs_tools.get_task_name(env, env_args)[source]

Get task name.

harl.utils.configs_tools.init_dir(env, env_args, algo, exp_name, seed, logger_path)[source]

Init directory for saving results.

harl.utils.configs_tools.is_json_serializable(value)[source]

Check if v is JSON serializable.

harl.utils.configs_tools.save_config(args, algo_args, env_args, run_dir)[source]

Save the configuration of the program.

harl.utils.configs_tools.update_args(unparsed_dict, *args)[source]

Update loaded config with unparsed command-line arguments. :param unparsed_dict: (dict) Unparsed command-line arguments. :param *args: (list[dict]) argument dicts to be updated.

harl.utils.discrete_util module

harl.utils.discrete_util.gumbel_softmax(logits, device, temperature=1.0, hard=False)[source]

Sample from the Gumbel-Softmax distribution and optionally discretize. :param logits: [batch_size, n_class] unnormalized log-probs :param temperature: non-negative scalar :param hard: if True, take argmax, but differentiate w.r.t. soft sample y

Returns:

[batch_size, n_class] sample from the Gumbel-Softmax distribution. If hard=True, then the returned sample will be one-hot, otherwise it will be a probabilitiy distribution that sums to 1 across classes

harl.utils.discrete_util.gumbel_softmax_sample(logits, temperature, device)[source]

Draw a sample from the Gumbel-Softmax distribution

harl.utils.discrete_util.onehot_from_logits(logits, eps=0.0)[source]

Given batch of logits, return one-hot sample using epsilon greedy strategy (based on given epsilon)

harl.utils.discrete_util.sample_gumbel(shape, device, eps=1e-20, tens_type=<class 'torch.FloatTensor'>)[source]

Sample from Gumbel(0, 1)

harl.utils.envs_tools module

Tools for HARL.

harl.utils.envs_tools.check(value)[source]

Check if value is a numpy array, if so, convert it to a torch tensor.

harl.utils.envs_tools.get_num_agents(env, env_args, envs)[source]

Get the number of agents in the environment.

harl.utils.envs_tools.get_shape_from_act_space(act_space)[source]

Get shape from action space. :param act_space: (gym.spaces) action space

Returns:

(tuple) action shape

Return type:

act_shape

harl.utils.envs_tools.get_shape_from_obs_space(obs_space)[source]

Get shape from observation space. :param obs_space: (gym.spaces or list) observation space

Returns:

(tuple) observation shape

Return type:

obs_shape

harl.utils.envs_tools.make_eval_env(env_name, seed, n_threads, env_args)[source]

Make env for evaluation.

harl.utils.envs_tools.make_render_env(env_name, seed, env_args)[source]

Make env for rendering.

harl.utils.envs_tools.make_train_env(env_name, seed, n_threads, env_args)[source]

Make env for training.

harl.utils.envs_tools.set_seed(args)[source]

Seed the program.

harl.utils.models_tools module

Tools for HARL.

harl.utils.models_tools.get_active_func(activation_func)[source]

Get the activation function. :param activation_func: (str) activation function

Returns:

(torch.nn) activation function

Return type:

activation function

harl.utils.models_tools.get_clones(module, N)[source]

Clone module for N times.

harl.utils.models_tools.get_grad_norm(parameters)[source]

Get gradient norm.

harl.utils.models_tools.get_init_method(initialization_method)[source]

Get the initialization method. :param initialization_method: (str) initialization method

Returns:

(torch.nn) initialization method

Return type:

initialization method

harl.utils.models_tools.huber_loss(e, d)[source]

Huber loss.

harl.utils.models_tools.init(module, weight_init, bias_init, gain=1)[source]

Init module. :param module: (torch.nn) module :param weight_init: (torch.nn) weight init :param bias_init: (torch.nn) bias init :param gain: (float) gain

Returns:

(torch.nn) module

Return type:

module

harl.utils.models_tools.init_device(args)[source]

Init device. :param args: (dict) arguments

Returns:

(torch.device) device

Return type:

device

harl.utils.models_tools.mse_loss(e)[source]

MSE loss.

harl.utils.models_tools.update_linear_schedule(optimizer, epoch, total_num_epochs, initial_lr)[source]

Decreases the learning rate linearly :param optimizer: (torch.optim) optimizer :param epoch: (int) current epoch :param total_num_epochs: (int) total number of epochs :param initial_lr: (float) initial learning rate

harl.utils.trans_tools module

Tools for HARL.

harl.utils.trpo_util module

TRPO utility functions.

harl.utils.trpo_util.conjugate_gradient(actor, obs, rnn_states, action, masks, available_actions, active_masks, b, nsteps, device, residual_tol=1e-10)[source]

Conjugate gradient algorithm. # refer to https://github.com/openai/baselines/blob/master/baselines/common/cg.py

harl.utils.trpo_util.fisher_vector_product(actor, obs, rnn_states, action, masks, available_actions, active_masks, p)[source]

Fisher vector product.

harl.utils.trpo_util.flat_grad(grads)[source]

Flatten the gradients.

harl.utils.trpo_util.flat_hessian(hessians)[source]

Flatten the hessians.

harl.utils.trpo_util.flat_params(model)[source]

Flatten the parameters.

harl.utils.trpo_util.kl_approx(p, q)[source]

KL divergence between two distributions.

harl.utils.trpo_util.kl_divergence(obs, rnn_states, action, masks, available_actions, active_masks, new_actor, old_actor)[source]

KL divergence between two distributions.

harl.utils.trpo_util.update_model(model, new_params)[source]

Update the model parameters.

Module contents