LLM Explanation and Guidance Evaluation

To address the need for more formal evaluation of the LLM-based explanations and guidance, this section provides concrete examples from our experiments. Each example includes the system observation, the action proposed by the RL agent, a natural language explanation generated by the LLM, and a validity assessment from a human data-center expert.

These examples are designed to showcase the LLM’s ability to:

  • Correctly interpret the RL agent’s actions based on system state.

  • Provide plausible reasoning for the control decisions.

  • Identify suboptimal or inefficient actions from the RL agent.

  • Offer actionable guidance to improve safety and energy efficiency.


LLM Explainability Example

Obs: (315.45, 314.04, 311.25, 46806.57, 46806.57, 31632.4)

Act (from RL agent): (0.24, 0.35, 0.41, 40.95, 24.66)

Explanation: Given that thermal readings remain within tolerable bounds, the agent increases the setpoint to 40.95 for energy conservation. Coolant flow is distributed with branch 3 receiving the most, targeting demand while sidestepping unnecessary cooling. This approach supports environmental compliance standards while ensuring uptime. Reducing cooling overheads has cascading social benefits, especially in energy-constrained regions.

Validity (Human Expert Evaluation): “While the LLM response correctly attributes the increased temperature setpoint of the coolant due to moderate temperatures in the cabinet, it does not completely explain the other values that were generated by the reinforcement learning agent. Also, it does not explain why the current distribution of the fluid happens across the three branches.”

LLM Explainability and Guidance Example

Obs/Act: (307.2, 306.74, 309.77, 21567.55,  44779.22, 33672.41)

Act (from RL agent): (0.19, 0.61, 0.2, 22.39, 25.52)

Explanation: Branch 2 remains relatively cool, allowing the controller to redistribute cooling capacity toward more loaded or hotter branches. A setpoint of 22.39 and a controlled flow of 25.52 signify this adaptive behavior. The decision not only maximizes performance-per-watt but also aligns with global decarbonization efforts. It is recommended to reduce branch 1 coolant allocation to ~0.1 and to allocate the rest to branch 3 since it has a higher workload.

Validity (Human Expert Evaluation): “This is one of the better examples where the agent is explaining the actions for all three blade groups. It however, provides a good way of refining the RL valve actions.”

Explainability Example (Step 1)

Obs: (-0.174, 0.373, 2.373, -0.886, -1.000, -0.975)

Act (from RL agent): (0.028, 0.009, 0.963, -0.652, -0.094)

Explanation: The agent identifies a significant thermal hotspot in branch 3, which is reporting a normalized temperature of 2.373. In response, the control action decisively allocates the vast majority of the coolant flow (96.3%) to this specific branch. This is a classic reactive measure designed to rapidly mitigate the high temperature, ensuring hardware integrity and preventing thermal throttling. The actions for the other branches are minimized to maximize the cooling directed at the critical area.

Validity (Human Expert Evaluation): “The explanation correctly identifies the cause and effect. The agent is clearly reacting to the high temperature in the third branch. The explanation is accurate and reflects a fundamental principle of this control system.”

Explainability and Guidance Example (Step 7)

Obs: (0.025, -0.027, 0.041, -0.861, -1.000, -0.861)

Act (from RL agent): (0.139, 0.544, 0.317, -0.079, -0.823)

Explanation: With all temperatures well within the nominal operating range and thermal loads low, the agent correctly shifts its priority from aggressive cooling to energy conservation. The control action maintains a moderate and balanced coolant flow across the branches to ensure readiness while minimizing pump energy usage. However, given the very low temperatures, there is an opportunity for further savings. It is recommended to raise the coolant supply temperature setpoint (first value of the top-level action) to be positive, allowing the system to warm slightly and further reduce cooling overhead.

Validity (Human Expert Evaluation): “The analysis is sound. The system is in a low-load state, and the RL agent’s action is reasonable. The guidance to increase the temperature setpoint is an excellent, actionable recommendation that would improve energy efficiency without compromising safety.”

Explainability Example (Step 13)

Obs: (-0.305, -0.365, -0.299, -1.000, -0.769, -0.953)

Act (from RL agent): (0.215, 0.199, 0.585, -1.377, 1.765)

Explanation: The observation shows uniformly low temperatures across all branches, indicating a very low computational load or a highly efficient cooling state. The controller maintains a baseline coolant distribution, with a slight preference for branch 3, likely to ensure readiness for potential load spikes. The unusually high differential pressure (Act=1.765) suggests the agent is responding to a global system parameter not visible in this local state, such as a requirement to maintain high pressure across the entire data center loop.

Validity (Human Expert Evaluation): “The explanation correctly infers the low-load state from the temperature data. The hypothesis that the high differential pressure is a response to a global system constraint is insightful and plausible, demonstrating the LLM’s ability to reason about interconnected systems.”

Explainability and Guidance Example (Step 22)

Obs: (-0.497, -0.458, -0.421, -1.000, -0.858, -0.864)

Act (from RL agent): (0.673, 0.250, 0.078, -0.729, 1.869)

Explanation: The RL agent is executing a highly aggressive cooling action, evidenced by the extreme differential pressure (1.869) and a flow distribution heavily biased toward branch 1. However, the system state indicates that all blade temperatures are substantially below the target operational range (all normalized values are negative), suggesting the system is being severely overcooled. This action is highly inefficient and wastes a significant amount of energy. It is strongly recommended to override this action by increasing the coolant temperature setpoint and drastically reducing the pump pressure to align with the actual low thermal load, prioritizing energy conservation.

Validity (Human Expert Evaluation): “The guidance is spot-on. The LLM correctly identified that the RL agent’s action is suboptimal and wasteful given the system state. This is a critical safety and efficiency check, perfectly illustrating the value of having the LLM as an oversight component to refine and improve the RL policy.”

Explainability Example (Step 2)

Obs: (-0.131, 0.372, 1.133, -0.836, -1.000, -0.886)

Act (from RL agent): (0.077, 0.157, 0.766, -0.600, -1.197)

Explanation: The agent correctly identifies a moderate thermal concern in branch 3 (temperature of 1.133) while other branches remain cool. The control policy responds by directing a significant majority of the coolant flow (76.6%) to branch 3. This is a proportional and targeted response, aiming to normalize the temperature in the warmer branch without wasting energy by overcooling the entire cabinet. The negative temperature setpoint and differential pressure indicate a proactive cooling stance.

Validity (Human Expert Evaluation): “The explanation is accurate. The RL agent’s action is a textbook example of appropriately handling a localized thermal load. The reasoning is sound and reflects a good understanding of the control objective.”

Explainability and Guidance Example (Step 5)

Obs: (0.127, 0.182, 0.322, -0.861, -1.000, -0.861)

Act (from RL agent): (0.118, 0.268, 0.614, 0.625, -2.185)

Explanation: The system state shows all temperatures are well within the safe, lower end of the operating range, indicating minimal thermal load. However, the RL agent has chosen a highly aggressive cooling action, with a differential pressure setpoint of -2.185. This level of cooling is unnecessary and highly inefficient, consuming significant power for no thermal benefit. It is recommended to significantly raise the differential pressure setpoint to a positive value and reduce coolant flow to conserve energy until temperatures begin to approach the upper end of the nominal range.

Validity (Human Expert Evaluation): “The guidance correctly identifies a major inefficiency. The RL agent is over-cooling the system, and the recommendation to reduce cooling intensity is absolutely correct. This is a perfect use case for an expert system to refine a suboptimal RL action.”

Explainability Example (Step 9)

Obs: (-0.180, -0.146, -0.120, -0.861, -1.000, -0.861)

Act (from RL agent): (0.065, 0.470, 0.465, -1.211, -1.043)

Explanation: With all observed temperatures being very low (all normalized values are negative), the system is in an energy-conserving state. The agent’s action reflects this by setting a low temperature and pressure setpoint. The coolant flow is balanced between branches 2 and 3 while minimizing flow to branch 1. This maintains a state of readiness and gentle circulation, preventing stagnation while consuming minimal power, which is appropriate for the current low-load conditions.

Validity (Human Expert Evaluation): “The explanation correctly interprets the agent’s actions in the context of a low-load state. The policy is clearly prioritizing energy savings while maintaining a baseline level of circulation, which is a sensible strategy.”

Explainability and Guidance Example (Step 16)

Obs: (-0.388, -0.396, -0.411, -1.000, -0.944, -0.777)

Act (from RL agent): (0.406, 0.094, 0.500, -2.427, 0.935)

Explanation: The system temperatures are extremely low, indicating a near-idle state. The RL agent, however, has set an exceptionally low coolant supply temperature setpoint of -2.427. This action is counterproductive, as it drives the system into a state of deep over-cooling and wastes considerable energy. It is strongly recommended to override this action. The temperature setpoint should be immediately raised to a positive value (e.g., > 1.0) to allow the system to warm up naturally and eliminate the unnecessary cooling power consumption.

Validity (Human Expert Evaluation): “The guidance is critical and correct. The RL agent’s temperature setpoint is illogical and demonstrates a failure to adapt to the low-load state. The recommendation to raise the temperature is essential for efficient operation.”

Explainability Example (Step 25)

Obs: (-0.507, -0.450, -0.457, -1.000, -0.722, -1.000)

Act (from RL agent): (0.092, 0.741, 0.167, -1.498, 0.671)

Explanation: The observation data shows the system is in a state of extreme cold, with all temperatures far below nominal levels. Despite this, the RL agent is actively cooling, directing the majority of its effort (74.1% of flow) to branch 2, which is already one of the coldest. This action is highly illogical and wastes energy. It is recommended to cease active cooling. The valve actuations should be minimized, and the temperature setpoint should be set to a high positive value to allow the system’s residual heat to bring temperatures back into the desired operational window.

Validity (Human Expert Evaluation): “This guidance correctly identifies a significant flaw in the RL agent’s logic. Cooling an already cold system is a fundamental error. The recommendation to halt cooling and allow the system to warm is the only sensible course of action. This highlights the importance of the LLM as a sanity check.”