harl.utils package¶

Submodules¶

harl.utils.configs_tools module¶

Tools for loading and updating configs.

harl.utils.configs_tools.convert_json(obj)[source]¶: Convert obj to a version which can be serialized with JSON.

harl.utils.configs_tools.get_defaults_yaml_args(algo, env)[source]¶

Load config file for user-specified algo and env. :param algo: (str) Algorithm name. :param env: (str) Environment name.

Returns:: (dict) Algorithm config. env_args: (dict) Environment config.
Return type:: algo_args

harl.utils.configs_tools.get_task_name(env, env_args)[source]¶: Get task name.

harl.utils.configs_tools.init_dir(env, env_args, algo, exp_name, seed, logger_path)[source]¶: Init directory for saving results.

harl.utils.configs_tools.is_json_serializable(value)[source]¶: Check if v is JSON serializable.

harl.utils.configs_tools.save_config(args, algo_args, env_args, run_dir)[source]¶: Save the configuration of the program.

harl.utils.configs_tools.update_args(unparsed_dict, *args)[source]¶: Update loaded config with unparsed command-line arguments. :param unparsed_dict: (dict) Unparsed command-line arguments. :param *args: (list[dict]) argument dicts to be updated.

harl.utils.discrete_util module¶

harl.utils.discrete_util.gumbel_softmax(logits, device, temperature=1.0, hard=False)[source]¶

Sample from the Gumbel-Softmax distribution and optionally discretize. :param logits: [batch_size, n_class] unnormalized log-probs :param temperature: non-negative scalar :param hard: if True, take argmax, but differentiate w.r.t. soft sample y

Returns:: [batch_size, n_class] sample from the Gumbel-Softmax distribution. If hard=True, then the returned sample will be one-hot, otherwise it will be a probabilitiy distribution that sums to 1 across classes

harl.utils.discrete_util.gumbel_softmax_sample(logits, temperature, device)[source]¶: Draw a sample from the Gumbel-Softmax distribution

harl.utils.discrete_util.onehot_from_logits(logits, eps=0.0)[source]¶: Given batch of logits, return one-hot sample using epsilon greedy strategy (based on given epsilon)

harl.utils.discrete_util.sample_gumbel(shape, device, eps=1e-20, tens_type=<class 'torch.FloatTensor'>)[source]¶: Sample from Gumbel(0, 1)

harl.utils.envs_tools module¶

Tools for HARL.

harl.utils.envs_tools.check(value)[source]¶: Check if value is a numpy array, if so, convert it to a torch tensor.

harl.utils.envs_tools.get_num_agents(env, env_args, envs)[source]¶: Get the number of agents in the environment.

harl.utils.envs_tools.get_shape_from_act_space(act_space)[source]¶

Get shape from action space. :param act_space: (gym.spaces) action space

Returns:: (tuple) action shape
Return type:: act_shape

harl.utils.envs_tools.get_shape_from_obs_space(obs_space)[source]¶

Get shape from observation space. :param obs_space: (gym.spaces or list) observation space

Returns:: (tuple) observation shape
Return type:: obs_shape

harl.utils.envs_tools.make_eval_env(env_name, seed, n_threads, env_args)[source]¶: Make env for evaluation.

harl.utils.envs_tools.make_render_env(env_name, seed, env_args)[source]¶: Make env for rendering.

harl.utils.envs_tools.make_train_env(env_name, seed, n_threads, env_args)[source]¶: Make env for training.

harl.utils.envs_tools.set_seed(args)[source]¶: Seed the program.

harl.utils.models_tools module¶

Tools for HARL.

harl.utils.models_tools.get_active_func(activation_func)[source]¶

Get the activation function. :param activation_func: (str) activation function

Returns:: (torch.nn) activation function
Return type:: activation function

harl.utils.models_tools.get_clones(module, N)[source]¶: Clone module for N times.

harl.utils.models_tools.get_grad_norm(parameters)[source]¶: Get gradient norm.

harl.utils.models_tools.get_init_method(initialization_method)[source]¶

Get the initialization method. :param initialization_method: (str) initialization method

Returns:: (torch.nn) initialization method
Return type:: initialization method

harl.utils.models_tools.huber_loss(e, d)[source]¶: Huber loss.

harl.utils.models_tools.init(module, weight_init, bias_init, gain=1)[source]¶

Init module. :param module: (torch.nn) module :param weight_init: (torch.nn) weight init :param bias_init: (torch.nn) bias init :param gain: (float) gain

Returns:: (torch.nn) module
Return type:: module

harl.utils.models_tools.init_device(args)[source]¶

Init device. :param args: (dict) arguments

Returns:: (torch.device) device
Return type:: device

harl.utils.models_tools.mse_loss(e)[source]¶: MSE loss.

harl.utils.models_tools.update_linear_schedule(optimizer, epoch, total_num_epochs, initial_lr)[source]¶: Decreases the learning rate linearly :param optimizer: (torch.optim) optimizer :param epoch: (int) current epoch :param total_num_epochs: (int) total number of epochs :param initial_lr: (float) initial learning rate

harl.utils.trans_tools module¶

Tools for HARL.

harl.utils.trpo_util module¶

TRPO utility functions.

harl.utils.trpo_util.conjugate_gradient(actor, obs, rnn_states, action, masks, available_actions, active_masks, b, nsteps, device, residual_tol=1e-10)[source]¶: Conjugate gradient algorithm. # refer to https://github.com/openai/baselines/blob/master/baselines/common/cg.py

harl.utils.trpo_util.fisher_vector_product(actor, obs, rnn_states, action, masks, available_actions, active_masks, p)[source]¶: Fisher vector product.

harl.utils.trpo_util.flat_grad(grads)[source]¶: Flatten the gradients.

harl.utils.trpo_util.flat_hessian(hessians)[source]¶: Flatten the hessians.

harl.utils.trpo_util.flat_params(model)[source]¶: Flatten the parameters.

harl.utils.trpo_util.kl_approx(p, q)[source]¶: KL divergence between two distributions.

harl.utils.trpo_util.kl_divergence(obs, rnn_states, action, masks, available_actions, active_masks, new_actor, old_actor)[source]¶: KL divergence between two distributions.

harl.utils.trpo_util.update_model(model, new_params)[source]¶: Update the model parameters.

harl.utils package¶

Submodules¶

harl.utils.configs_tools module¶

harl.utils.discrete_util module¶

harl.utils.envs_tools module¶

harl.utils.models_tools module¶

harl.utils.trans_tools module¶

harl.utils.trpo_util module¶

Module contents¶