Reward Functions¶

SustainDC provides an interface where the end-user can choose to train agents independently of each other’s reward feedback, or consider a collaborative reward approach.

Independent rewards¶

Agent	Default Reward
agent_ls	\[r_{ls}~ =~ -(Penalty \cdot UL)-(CI \cdot Total EC) \]
agent_dc	\[r_{dc} ~=~ -( Total EC) \]
agent_bat	\[r_{bat}~ =~ -(CI \cdot Total EC) \]

Total EC refers to the total building energy consumption (HVAC+IT), CI is the carbon intensity in the power grid indicating the inverse of the availability of green energy, and UL refers to the amount of unassigned flexible computational workload. A penalty is attributed to the load shifting agent if it fails to schedule all the required load within the time horizon N.

Collaborative rewards¶

The reward-sharing mechanism allows the agents to estimate the feedback from their actions in other environments. Users have an option to choose the level of colaboration between the agents. This can be done by specifying the \(\eta\) value in the script.

Agent	Reward
agent_ls	\[\eta \cdot r_{ls} + (1-\eta) /2 \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]
agent_dc	\[(1-\eta) /2 \cdot r_{ls} + \eta \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]
agent_bat	\[(1-\eta) /2 \cdot r_{ls} + (1-\eta)/2 \cdot r_{dc} + \eta \cdot r_{bat}\]

Example \(\eta\) values to set up a collaborative, independent and custom weighted environment are given in the table below. The higher the value of \(\eta\), the less collaboration between the agents in the environment.

Reward Scheme	Implementation
Collaborative	`individual_reward_weight': 0.33`
Independent	`individual_reward_weight': 1.0`
Default (weighted)	`individual_reward_weight': 0.8`