Custom Reward Functions

SustainDC provides an interface where the end-user can choose to train agents independently of each other’s reward feedback, or consider a collaborative reward approach.

Independent reward functions

Agent

Default Reward

agent_ls

\[r_{ls}~ =~ -(Penalty \cdot UL)-(CI \cdot Total EC) \]

agent_dc

\[r_{dc} ~=~ -( Total EC) \]

agent_bat

\[r_{bat}~ =~ -(CI \cdot Total EC) \]

Total EC refers to the total building energy consumption (HVAC+IT), CI is the carbon intensity in the power grid indicating the inverse of the availability of green energy, and UL refers to the amount of unassigned flexible computational workload. A penalty is attributed to the load shifting agent if it fails to schedule all the required load within the time horizon N.

Collaborative reward functions

The reward-sharing mechanism allows the agents to estimate the feedback from their actions in other environments. Users have an option to choose the level of colaboration between the agents. This can be done by specifying the \(\eta\) value in the script.

Agent

Reward

agent_ls

\[\eta \cdot r_{ls} + (1-\eta) /2 \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]

agent_dc

\[(1-\eta) /2 \cdot r_{ls} + \eta \cdot r_{dc} + (1-\eta)/2 \cdot r_{bat}\]

agent_bat

\[(1-\eta) /2 \cdot r_{ls} + (1-\eta)/2 \cdot r_{dc} + \eta \cdot r_{bat}\]

Example \(\eta\) values to set up a collaborative, independent and custom weighted environment are given in the table below. The higher the value of \(\eta\), the less collaboration between the agents in the environment.

Reward Scheme

Implementation

Collaborative

individual_reward_weight': 0.33

Independent

individual_reward_weight': 1.0

Default (weighted)

individual_reward_weight': 0.8

Custom reward functions

SustainDC allows users to define custom reward structures to promote collaborative optimization across different DC components. Users can modify the reward functions in the utils/reward_creator.py file to suit their specific optimization goals. Those function should follow the schema:

def custom_agent_reward(params: dict) -> float:
    #read reward input parameters from dict object
    #custom reward calculations
    custom_reward = 0.0 #update with custom reward shaping
    return custom_reward

Next, users need to add the new custom reward function(s) to the REWARD_METHOD_MAP dictionary:

REWARD_METHOD_MAP = {
'default_dc_reward' : default_dc_reward,
'default_bat_reward': default_bat_reward,
'default_ls_reward' : default_ls_reward,
# Add custom reward methods here
'custom_agent_reward' : custom_agent_reward,
'tou_reward' : tou_reward,
'renewable_energy_reward' : renewable_energy_reward,
'energy_efficiency_reward' : energy_efficiency_reward,
'energy_PUE_reward' : energy_PUE_reward,
'temperature_efficiency_reward' : temperature_efficiency_reward,
'water_usage_efficiency_reward' : water_usage_efficiency_reward,
}

A dictionary of the environment parameters (reward_params) is available to users in sustaindc_env.py. This object consists of the information dictionary of each environment, and some other global variables such as time, day, carbon intensity, outside temperature, etc. If a user wants to add additional custom parameters, they must be added in the dictionary reward_params so that those variables are visible in the reward function. Within the dictionary, the following environment parameters are available to users:

Parameter

Example

Description

bat_action

0

Action of the battery agent

bat_SOC

0.15

Battery state of charge

bat_CO2_footprint

380899.11

CO2 footprint obtained after the battery action

bat_avg_CI

500.63

Carbon Intensity used in battery (same as current CI)

bat_total_energy_without_battery_KWh

610.82

Total energy (KWh) consumed before battery action

bat_total_energy_with_battery_KWh

760.82

Total energy (KWh) consumed after battery action

bat_max_bat_cap

2

Battery maximum capacity (KWh)

bat_a_t

charge

Name of the action of the battery agent

bat_dcload_min

0.6

Datacenter minimal energy consumption (MWh). Used in normalization.

bat_dcload_max

1.81

Datacenter maximal energy consumption (MWh). Used in normalization.

ls_original_workload

0.69

Normalized original workload (%) before load shifting agent

ls_shifted_workload

0.69

Normalized shifted workload (%) after load shifting agent

ls_action

1

Action of the load shifting agent

ls_norm_load_left

1

Normalized current unassigned flexible workload

ls_unasigned_day_load_left

0

Total end of the day unassigned flexible workload

ls_penalty_flag

FALSE

Flag to indicate that at the end of the day, there is some workload unassigned.

dc_ITE_total_power_kW

2443.31

IT total power consumption (KW)

dc_HVAC_total_power_kW

0

HVAC total power consumption (KW). It doesn’t consider the CRAC fan load, CRAC cooling load, or the compressor load because these have constant power consumption.

dc_total_power_kW

2443.31

Total power consumption (KW)

dc_energy_lb_kW

40000

Lower bound in the normalization of the datacenter energy consumption (W)

dc_energy_ub_kW

160000

Upper bound in the normalization of the datacenter energy consumption (W)

dc_crac_setpoint_delta

-5

HVAC temperature delta change in the setpoint

dc_crac_setpoint

15

HVAC current temperature setpoint (C)

dc_cpu_workload_percent

0.69

Normalized shifted workload (%) that the dc agent is using (same as ls_shifted_workload)

dc_int_temperature

29.60

Room temperature (C)

outside_temp

3.50

Outside temperature (C)

day_of_the_year

0

Day of the year

hour

0.25

Hour.Fraction of the hour.

day_workload

0

Total flexible workload at the start of the day

norm_CI

current normalized carbon intensity

Depending on the objective and requirements, users can utilize a combination of these parameters to define their customized reward functions, or use one of already provided reward functions.

Some examples of custom rewards are listed below:

Example 1: Reward function based on power usage effectivness (PUE)

def energy_PUE_reward(params: dict) -> float:
    """
    Calculates a reward value based on Power Usage Effectiveness (PUE).

    Args:
        params (dict): Dictionary containing parameters:
            total_energy_consumption (float): Total energy consumption of the data center.
            it_equipment_energy (float): Energy consumed by the IT equipment.

    Returns:
        float: Reward value.
    """
    total_energy_consumption = params['total_energy_consumption']
    it_equipment_energy = params['it_equipment_energy']

    # Calculate PUE
    pue = total_energy_consumption / it_equipment_energy if it_equipment_energy != 0 else float('inf')

    # We aim to get PUE as close to 1 as possible, hence we take the absolute difference between PUE and 1
    # We use a negative sign since RL seeks to maximize reward, but we want to minimize PUE
    reward = -abs(pue - 1)

    return reward

Example 2: Reward function based on time of use (ToU) of energy

def tou_reward(params: dict) -> float:
    """
    Calculates a reward value based on the Time of Use (ToU) of energy.

    Args:
        params (dict): Dictionary containing parameters:
            energy_usage (float): The energy usage of the agent.
            hour (int): The current hour of the day (24-hour format).

    Returns:
        float: Reward value.
    """

    # ToU dict: {Hour, price}
    tou = {0: 0.25,
        1: 0.25,
        2: 0.25,
        3: 0.25,
        4: 0.25,
        5: 0.25,
        6: 0.41,
        7: 0.41,
        8: 0.41,
        9: 0.41,
        10: 0.41,
        11: 0.30,
        12: 0.30,
        13: 0.30,
        14: 0.30,
        15: 0.30,
        16: 0.27,
        17: 0.27,
        18: 0.27,
        19: 0.27,
        20: 0.27,
        21: 0.27,
        22: 0.25,
        23: 0.25,
        }

    # Obtain the price of electricity at the current hour
    current_price = tou[params['hour']]
    # Obtain the energy usage
    energy_usage = params['bat_total_energy_with_battery_KWh']

    # The reward is negative as the agent's objective is to minimize energy cost
    tou_reward = -1.0 * energy_usage * current_price

    return tou_reward

Example 3: Reward function based on the usage of renewable energy sources

def renewable_energy_reward(params: dict) -> float:
    """
    Calculates a reward value based on the usage of renewable energy sources.

    Args:
        params (dict): Dictionary containing parameters:
            renewable_energy_ratio (float): Ratio of energy coming from renewable sources.
            total_energy_consumption (float): Total energy consumption of the data center.

    Returns:
        float: Reward value.
    """
    assert params.get('renewable_energy_ratio') is not None, 'renewable_energy_ratio is not defined. This parameter should be included using some external dataset and added to the reward_info dictionary'
    renewable_energy_ratio = params['renewable_energy_ratio'] # This parameter should be included using some external dataset
    total_energy_consumption = params['bat_total_energy_with_battery_KWh']
    factor = 1.0 # factor to scale the weight of the renewable energy usage

    # Reward = maximize renewable energy usage - minimize total energy consumption
    reward = factor * renewable_energy_ratio  -1.0 * total_energy_consumption
    return reward

Example 4: Reward function based on energy efficiency

def energy_efficiency_reward(params: dict) -> float:
    """
    Calculates a reward value based on energy efficiency.

    Args:
        params (dict): Dictionary containing parameters:
            ITE_load (float): The amount of energy spent on computation (useful work).
            total_energy_consumption (float): Total energy consumption of the data center.

    Returns:
        float: Reward value.
    """
    it_equipment_power = params['dc_ITE_total_power_kW']
    total_power_consumption = params['dc_total_power_kW']

    reward = it_equipment_power / total_power_consumption
    return reward

Example 5: Reward function based on the efficiency of cooling in the data center

def temperature_efficiency_reward(params: dict) -> float:
    """
    Calculates a reward value based on the efficiency of cooling in the data center.

    Args:
        params (dict): Dictionary containing parameters:
            current_temperature (float): Current temperature in the data center.
            optimal_temperature_range (tuple): Tuple containing the minimum and maximum optimal temperatures for the data center.

    Returns:
        float: Reward value.
    """
    assert params.get('optimal_temperature_range') is not None, 'optimal_temperature_range is not defined. This parameter should be added to the reward_info dictionary'
    current_temperature = params['dc_int_temperature']
    optimal_temperature_range = params['optimal_temperature_range']
    min_temp, max_temp = optimal_temperature_range

    if min_temp <= current_temperature <= max_temp:
        reward = 1.0
    else:
        if current_temperature < min_temp:
            reward = -abs(current_temperature - min_temp)
        else:
            reward = -abs(current_temperature - max_temp)
    return reward

Example 6: Reward function based on the efficiency of water usage in the data center

def water_usage_efficiency_reward(params: dict) -> float:
"""
Calculates a reward value based on the efficiency of water usage in the data center.

A lower value of water usage results in a higher reward, promoting sustainability
and efficiency in water consumption.

Args:
    params (dict): Dictionary containing parameters:
        dc_water_usage (float): The amount of water used by the data center in a given period.

Returns:
    float: Reward value. The reward is higher for lower values of water usage,
    promoting reduced water consumption.
"""
dc_water_usage = params['dc_water_usage']

# Calculate the reward. This is a simple inverse relationship; many other functions could be applied.
# Adjust the scalar as needed to fit the scale of your rewards or to emphasize the importance of water savings.
reward = -0.01 * dc_water_usage

return reward

By leveraging these customization options, users can create highly specific and optimized simulations that reflect the unique requirements and challenges of their DC operations.