Reward Functions¶
Sustain-Cluster’s reward framework defines how the Top-Level Agent is incentivized to make spatio-temporal scheduling decisions. All reward classes inherit from rewards.base_reward.BaseReward
and are registered via the @register_reward(name)
decorator. At each simulation step, the agent computes a scalar reward by calling the selected reward function with the current cluster state, the list of tasks under consideration, and the current timestamp.
Base Class and Registry¶
- class BaseReward(**kwargs**)¶
Abstract base class for all rewards. Subclasses must implement:
__call__(cluster_info: dict, current_tasks: list, current_time: Any) -> float
get_last_value() -> float
Common behavior:
Stores the last computed reward in
self.last_reward
.Supports arbitrary constructor arguments via
**kwargs
.
Note
To expose a custom reward, subclass BaseReward
and annotate with
@register_reward("your_reward_name")
. The class will then be discoverable
via get_reward_function("your_reward_name", **args)
.
Built-in Reward Classes¶
Carbon Emissions Reward¶
- class CarbonEmissionsReward(normalize_factor: float = 100.0)¶
Parameters
normalize_factor (float) – Divisor to scale total CO₂ (kg) into reward units.
Description
Penalizes total carbon emissions across the cluster.
Computation
Sum
dc_info["__common__"]["carbon_emissions_kg"]
for every datacenter.Compute
reward = - total_emissions / normalize_factor
.Store value in
self.last_reward
.
\[E_{\mathrm{tot}} = \sum_{d \in D} e_{d},\quad R = -\frac{E_{\mathrm{tot}}}{\mathrm{normalize\_factor}}\]Variables
\(D\) – Set of all datacenters.
\(e_{d}\) – Carbon emissions (kg) of datacenter \(d\).
\(E_{\mathrm{tot}}\) – Total carbon emissions across \(D\).
\(\mathrm{normalize\_factor}\) – Constructor parameter.
\(R\) – Resulting reward.
Energy Consumption Reward¶
- class EnergyConsumptionReward(normalize_factor: float = 1000.0)¶
Parameters
normalize_factor (float) – Divisor to scale total kWh into reward units.
Description
Penalizes total energy usage across all datacenters.
Computation
Sum
dc_info["__common__"]["energy_consumption_kwh"]
for every datacenter.Compute
reward = - total_energy / normalize_factor
.Store in
self.last_reward
.
\[E_{\mathrm{tot}} = \sum_{d \in D} \mathrm{energy\_consumption\_kWh}_{d},\quad R = -\frac{E_{\mathrm{tot}}}{\mathrm{normalize\_factor}}\]Variables
\(D\) – Set of all datacenters.
\(\mathrm{energy\_consumption\_kWh}_{d}\) – Energy consumed (kWh) by datacenter \(d\).
\(E_{\mathrm{tot}}\) – Total energy consumption across \(D\).
\(R\) – Resulting reward.
Energy Price Reward¶
- class EnergyPriceReward(normalize_factor: float = 100000)¶
Parameters
normalize_factor (float) – Divisor to scale USD cost into reward units.
Description
Penalizes monetary cost of energy consumed by scheduled tasks, using real-time prices.
Computation
For each task in
current_tasks
:Retrieve
price = dest_dc.price_manager.get_current_price()
.Compute
task_energy = task.cores_req * task.duration
(kWh).Compute
task_cost = task_energy * price
.
Sum all
task_cost
values.Compute
reward = - total_task_cost / normalize_factor
.Store in
self.last_reward
.
\[C_{\mathrm{tot}} = \sum_{t \in T} p_{t}\,c_{t}\,\tau_{t},\quad R = -\frac{C_{\mathrm{tot}}}{\mathrm{normalize\_factor}}\]Variables
\(T\) – Set of tasks in
current_tasks
.\(p_{t}\) – Price (USD/kWh) returned by
dest_dc.price_manager.get_current_price()
for task \(t\).\(c_{t}\) –
task.cores_req
(number of cores) for task \(t\).\(\tau_{t}\) –
task.duration
(hours) for task \(t\).\(C_{\mathrm{tot}}\) – Total energy cost (USD).
\(R\) – Resulting reward.
SLA Penalty Reward¶
- class SLAPenaltyReward(penalty_per_violation: float = 10.0)¶
Parameters
penalty_per_violation (float) – Penalty per SLA breach.
Description
Penalizes missed service-level agreements across the cluster.
Computation
Count violations across all datacenters:
violations = sum(dc_info["__common__"]["__sla__"]["violated"] for dc_info in cluster_info["datacenter_infos"].values())
Compute
reward = - penalty_per_violation * violations
.Store in
self.last_reward
.
\[V = \sum_{d \in D} v_{d},\quad R = -\,\mathrm{penalty\_per\_violation}\;\times V\]Variables
\(D\) – Set of all datacenters.
\(v_{d}\) – Number of SLA violations in datacenter \(d\).
\(V\) – Total SLA violations across \(D\).
\(\mathrm{penalty\_per\_violation}\) – Constructor parameter.
\(R\) – Resulting reward.
Transmission Cost Reward¶
- class TransmissionCostReward(normalize_factor: float = 100.0)¶
Parameters
normalize_factor (float) – Divisor to scale USD transmission cost.
Description
Penalizes cumulative inter-datacenter bandwidth costs.
Computation
Read
cost = cluster_info["transmission_cost_total_usd"]
.Compute
reward = - cost / normalize_factor
.Store in
self.last_reward
.
\[C = \mathrm{transmission\_cost\_total\_usd},\quad R = -\frac{C}{\mathrm{normalize\_factor}}\]Variables
\(C\) – Total inter-datacenter transmission cost (USD).
\(R\) – Resulting reward.
Transmission Emissions Reward¶
- class TransmissionEmissionsReward(normalize_factor: float = 1.0)¶
Parameters
normalize_factor (float) – Divisor to scale kg CO₂ from transmission.
Description
Penalizes carbon emissions incurred by data transfer between datacenters.
Computation
Read
emissions_kg = cluster_info["transmission_emissions_total_kg"]
.Compute
reward = - emissions_kg / normalize_factor
.Store in
self.last_reward
.
\[E_{\mathrm{tr}} = \mathrm{transmission\_emissions\_total_kg},\quad R = -\frac{E_{\mathrm{tr}}}{\mathrm{normalize\_factor}}\]Variables
\(E_{\mathrm{tr}}\) – Total transmission emissions (kg CO₂).
\(R\) – Resulting reward.
Efficiency Reward¶
- class EfficiencyReward(normalize_factor: float = 1000.0)¶
Parameters
normalize_factor (float) – Divisor to scale energy per task.
Description
Encourages high energy efficiency per scheduled task.
Computation
Sum
total_energy
across datacenters.Read
total_tasks = cluster_info.get("scheduled_tasks", 0)
.If
total_tasks == 0
, return 0.Compute
reward = - (total_energy / total_tasks)
.Store in
self.last_reward
.
\[E_{\mathrm{tot}} = \sum_{d \in D} \mathrm{energy\_consumption\_kWh}_{d},\quad N = \mathrm{total\_tasks},\quad R = -\frac{E_{\mathrm{tot}}}{N}\]Variables
\(N\) – Number of scheduled tasks.
\(E_{\mathrm{tot}}\) – Total energy consumption (kWh).
\(R\) – Resulting reward.
Composite Reward¶
- class CompositeReward(components: dict, normalize: bool = True, epsilon: float = 1e-8)¶
Parameters
components (dict) – Mapping from reward name to a dict with keys: - weight (float) - args (constructor kwargs)
normalize (bool) – If True, z-score each component.
epsilon (float) – Small constant to avoid division by zero.
Description
Combines multiple reward signals into a single scalar via a weighted sum.
Internal State
running_stats
– Per-component running mean, variance, and count.last_values
– Last raw values before normalization.
Computation
For each (name, weight, fn) in components, call:
raw = fn(cluster_info, current_tasks, current_time)
.If normalize is True, update running stats for name and compute
component_value = (raw - mean) / (std + epsilon)
, otherwise setcomponent_value = raw
.Add
weight * component_value
to total.Set
self.last_reward = total
and return total.
\[\begin{split}\hat{v}_{i} = \begin{cases} \dfrac{raw_{i} - \mu_{i}}{\sigma_{i} + \epsilon}, & \text{if normalize} \\ raw_{i}, & \text{otherwise} \end{cases},\quad R = \sum_{i} w_{i}\,\hat{v}_{i}\end{split}\]Variables
\(raw_{i}\) – Raw value of component \(i\).
\(\mu_{i}, \sigma_{i}\) – Running mean and standard deviation of component \(i\).
\(\epsilon\) – Small constant to avoid division by zero.
\(\hat{v}_{i}\) – (Possibly normalized) component value.
\(w_{i}\) – Weight for component \(i\).
\(R\) – Resulting composite reward.
Registry and Invocation¶
from rewards.registry_utils import get_reward_function
reward_fn = get_reward_function("energy_price", normalize_factor=50000)
value = reward_fn(cluster_info, current_tasks, current_time)
raw_val = reward_fn.get_last_value()