.. _reward-functions:

Reward Functions
================

Sustain-Cluster’s reward framework defines how the Top-Level Agent is incentivized to make spatio-temporal scheduling decisions. All reward classes inherit from ``rewards.base_reward.BaseReward`` and are registered via the ``@register_reward(name)`` decorator. At each simulation step, the agent computes a scalar reward by calling the selected reward function with the current cluster state, the list of tasks under consideration, and the current timestamp.


Base Class and Registry
-----------------------

.. py:class:: BaseReward(**kwargs**)

   Abstract base class for all rewards. Subclasses must implement:

   - ``__call__(cluster_info: dict, current_tasks: list, current_time: Any) -> float``
   - ``get_last_value() -> float``

   Common behavior:

   - Stores the last computed reward in ``self.last_reward``.
   - Supports arbitrary constructor arguments via ``**kwargs``.

.. note::

   To expose a custom reward, subclass ``BaseReward`` and annotate with
   ``@register_reward("your_reward_name")``. The class will then be discoverable
   via ``get_reward_function("your_reward_name", **args)``.


Built-in Reward Classes
-----------------------

Carbon Emissions Reward
~~~~~~~~~~~~~~~~~~~~~~~

.. py:class:: CarbonEmissionsReward(normalize_factor: float = 100.0)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale total CO₂ (kg) into reward units.

   **Description**

   Penalizes total carbon emissions across the cluster.

   **Computation**

   1. Sum ``dc_info["__common__"]["carbon_emissions_kg"]`` for every datacenter.  
   2. Compute ``reward = - total_emissions / normalize_factor``.  
   3. Store value in ``self.last_reward``.

   .. math::

      E_{\mathrm{tot}} = \sum_{d \in D} e_{d},\quad
      R = -\frac{E_{\mathrm{tot}}}{\mathrm{normalize\_factor}}

   **Variables**

   - :math:`D` – Set of all datacenters.  
   - :math:`e_{d}` – Carbon emissions (kg) of datacenter :math:`d`.  
   - :math:`E_{\mathrm{tot}}` – Total carbon emissions across :math:`D`.  
   - :math:`\mathrm{normalize\_factor}` – Constructor parameter.  
   - :math:`R` – Resulting reward.


Energy Consumption Reward
~~~~~~~~~~~~~~~~~~~~~~~~~

.. py:class:: EnergyConsumptionReward(normalize_factor: float = 1000.0)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale total kWh into reward units.

   **Description**

   Penalizes total energy usage across all datacenters.

   **Computation**

   1. Sum ``dc_info["__common__"]["energy_consumption_kwh"]`` for every datacenter.  
   2. Compute ``reward = - total_energy / normalize_factor``.  
   3. Store in ``self.last_reward``.

   .. math::

      E_{\mathrm{tot}} = \sum_{d \in D} \mathrm{energy\_consumption\_kWh}_{d},\quad
      R = -\frac{E_{\mathrm{tot}}}{\mathrm{normalize\_factor}}

   **Variables**

   - :math:`D` – Set of all datacenters.  
   - :math:`\mathrm{energy\_consumption\_kWh}_{d}` – Energy consumed (kWh) by datacenter :math:`d`.  
   - :math:`E_{\mathrm{tot}}` – Total energy consumption across :math:`D`.  
   - :math:`R` – Resulting reward.


Energy Price Reward
~~~~~~~~~~~~~~~~~~~

.. py:class:: EnergyPriceReward(normalize_factor: float = 100000)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale USD cost into reward units.

   **Description**

   Penalizes monetary cost of energy consumed by scheduled tasks, using real-time prices.

   **Computation**

   1. For each task in ``current_tasks``:

      - Retrieve ``price = dest_dc.price_manager.get_current_price()``.  
      - Compute ``task_energy = task.cores_req * task.duration`` (kWh).  
      - Compute ``task_cost = task_energy * price``.  

   2. Sum all ``task_cost`` values.  
   3. Compute ``reward = - total_task_cost / normalize_factor``.  
   4. Store in ``self.last_reward``.

   .. math::

      C_{\mathrm{tot}} = \sum_{t \in T} p_{t}\,c_{t}\,\tau_{t},\quad
      R = -\frac{C_{\mathrm{tot}}}{\mathrm{normalize\_factor}}

   **Variables**

   - :math:`T` – Set of tasks in ``current_tasks``.  
   - :math:`p_{t}` – Price (USD/kWh) returned by ``dest_dc.price_manager.get_current_price()`` for task :math:`t`.  
   - :math:`c_{t}` – ``task.cores_req`` (number of cores) for task :math:`t`.  
   - :math:`\tau_{t}` – ``task.duration`` (hours) for task :math:`t`.  
   - :math:`C_{\mathrm{tot}}` – Total energy cost (USD).  
   - :math:`R` – Resulting reward.


SLA Penalty Reward
~~~~~~~~~~~~~~~~~~

.. py:class:: SLAPenaltyReward(penalty_per_violation: float = 10.0)

   **Parameters**

   - **penalty_per_violation** (float) – Penalty per SLA breach.

   **Description**

   Penalizes missed service-level agreements across the cluster.

   **Computation**

   1. Count violations across all datacenters:  
      ``violations = sum(dc_info["__common__"]["__sla__"]["violated"] for dc_info in cluster_info["datacenter_infos"].values())``  
   2. Compute ``reward = - penalty_per_violation * violations``.  
   3. Store in ``self.last_reward``.

   .. math::

      V = \sum_{d \in D} v_{d},\quad
      R = -\,\mathrm{penalty\_per\_violation}\;\times V

   **Variables**

   - :math:`D` – Set of all datacenters.  
   - :math:`v_{d}` – Number of SLA violations in datacenter :math:`d`.  
   - :math:`V` – Total SLA violations across :math:`D`.  
   - :math:`\mathrm{penalty\_per\_violation}` – Constructor parameter.  
   - :math:`R` – Resulting reward.


Transmission Cost Reward
~~~~~~~~~~~~~~~~~~~~~~~~

.. py:class:: TransmissionCostReward(normalize_factor: float = 100.0)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale USD transmission cost.

   **Description**

   Penalizes cumulative inter-datacenter bandwidth costs.

   **Computation**

   1. Read ``cost = cluster_info["transmission_cost_total_usd"]``.  
   2. Compute ``reward = - cost / normalize_factor``.  
   3. Store in ``self.last_reward``.

   .. math::

      C = \mathrm{transmission\_cost\_total\_usd},\quad
      R = -\frac{C}{\mathrm{normalize\_factor}}

   **Variables**

   - :math:`C` – Total inter-datacenter transmission cost (USD).  
   - :math:`R` – Resulting reward.


Transmission Emissions Reward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. py:class:: TransmissionEmissionsReward(normalize_factor: float = 1.0)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale kg CO₂ from transmission.

   **Description**

   Penalizes carbon emissions incurred by data transfer between datacenters.

   **Computation**

   1. Read ``emissions_kg = cluster_info["transmission_emissions_total_kg"]``.  
   2. Compute ``reward = - emissions_kg / normalize_factor``.  
   3. Store in ``self.last_reward``.

   .. math::

      E_{\mathrm{tr}} = \mathrm{transmission\_emissions\_total_kg},\quad
      R = -\frac{E_{\mathrm{tr}}}{\mathrm{normalize\_factor}}

   **Variables**

   - :math:`E_{\mathrm{tr}}` – Total transmission emissions (kg CO₂).  
   - :math:`R` – Resulting reward.


Efficiency Reward
~~~~~~~~~~~~~~~~~

.. py:class:: EfficiencyReward(normalize_factor: float = 1000.0)

   **Parameters**

   - **normalize_factor** (float) – Divisor to scale energy per task.

   **Description**

   Encourages high energy efficiency per scheduled task.

   **Computation**

   1. Sum ``total_energy`` across datacenters.  
   2. Read ``total_tasks = cluster_info.get("scheduled_tasks", 0)``.  
   3. If ``total_tasks == 0``, return 0.  
   4. Compute ``reward = - (total_energy / total_tasks)``.  
   5. Store in ``self.last_reward``.

   .. math::

      E_{\mathrm{tot}} = \sum_{d \in D} \mathrm{energy\_consumption\_kWh}_{d},\quad
      N = \mathrm{total\_tasks},\quad
      R = -\frac{E_{\mathrm{tot}}}{N}

   **Variables**

   - :math:`N` – Number of scheduled tasks.  
   - :math:`E_{\mathrm{tot}}` – Total energy consumption (kWh).  
   - :math:`R` – Resulting reward.


Composite Reward
----------------

.. py:class:: CompositeReward(components: dict, normalize: bool = True, epsilon: float = 1e-8)

   **Parameters**

   - **components** (dict) – Mapping from reward name to a dict with keys:  
     - **weight** (float)  
     - **args** (constructor kwargs)  
   - **normalize** (bool) – If True, z-score each component.  
   - **epsilon** (float) – Small constant to avoid division by zero.

   **Description**

   Combines multiple reward signals into a single scalar via a weighted sum.

   **Internal State**

   - ``running_stats`` – Per-component running mean, variance, and count.  
   - ``last_values`` – Last raw values before normalization.

   **Computation**

   1. For each `(name, weight, fn)` in `components`, call:  
      ``raw = fn(cluster_info, current_tasks, current_time)``.  
   2. If `normalize` is True, update running stats for `name` and compute  
      ``component_value = (raw - mean) / (std + epsilon)``, otherwise set  
      ``component_value = raw``.  
   3. Add ``weight * component_value`` to `total`.  
   4. Set ``self.last_reward = total`` and return `total`.

   .. math::

      \hat{v}_{i} =
      \begin{cases}
        \dfrac{raw_{i} - \mu_{i}}{\sigma_{i} + \epsilon}, & \text{if normalize} \\
        raw_{i}, & \text{otherwise}
      \end{cases},\quad
      R = \sum_{i} w_{i}\,\hat{v}_{i}

   **Variables**

   - :math:`raw_{i}` – Raw value of component :math:`i`.  
   - :math:`\mu_{i}, \sigma_{i}` – Running mean and standard deviation of component :math:`i`.  
   - :math:`\epsilon` – Small constant to avoid division by zero.  
   - :math:`\hat{v}_{i}` – (Possibly normalized) component value.  
   - :math:`w_{i}` – Weight for component :math:`i`.  
   - :math:`R` – Resulting composite reward.


Registry and Invocation
-----------------------

.. code-block:: python

   from rewards.registry_utils import get_reward_function

   reward_fn = get_reward_function("energy_price", normalize_factor=50000)
   value   = reward_fn(cluster_info, current_tasks, current_time)
   raw_val = reward_fn.get_last_value()