Reward Functions

SustainDC provides an interface where the end-user can choose to train agents independently of each other’s reward feedback, or consider a collaborative reward approach.

Independent rewards

Agent

Default Reward

agent_ls

\[r_{ls}~ =~ -(Penalty \cdot UL)-(CI \cdot Total EC) \]

agent_dc

\[r_{dc} ~=~ -( Total EC) \]

agent_bat

\[r_{bat}~ =~ -(CI \cdot Total EC) \]

Total EC refers to the total building energy consumption (HVAC+IT), CI is the carbon intensity in the power grid indicating the inverse of the availability of green energy, and UL refers to the amount of unassigned flexible computational workload. A penalty is attributed to the load shifting agent if it fails to schedule all the required load within the time horizon N.

Collaborative rewards

The reward-sharing mechanism allows the agents to estimate the feedback from their actions in other environments. Users have an option to choose the level of colaboration between the agents. This can be done by specifying the \(\eta\) value in the script.

Agent

Reward

agent_ls

\[\eta \cdot r_{ls} + (1-\eta) /2 \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]

agent_dc

\[(1-\eta) /2 \cdot r_{ls} + \eta \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]

agent_bat

\[(1-\eta) /2 \cdot r_{ls} + (1-\eta)/2 \cdot r_{dc} + \eta \cdot r_{bat}\]

Example \(\eta\) values to set up a collaborative, independent and custom weighted environment are given in the table below. The higher the value of \(\eta\), the less collaboration between the agents in the environment.

Reward Scheme

Implementation

Collaborative

individual_reward_weight': 0.33

Independent

individual_reward_weight': 1.0

Default (weighted)

individual_reward_weight': 0.8