Why the AI Does Not Control Relays

The AI does not flip relays. That is the central safety rule.

The AI planning agent writes bounded tactical intent: hysteresis, misting thresholds, fog escalation, dwell gates, water budgets, and rationale. The dispatcher validates those writes and owns the push from crop policy to firmware-enforced setpoints. The ESP32 firmware decides physical state every 5 seconds and owns the relays.

The exact planner-writable parameters are listed in AI Tunables Traceability.

If the planner is offline, publishing breaks, or the network path to the planner fails, the greenhouse controller keeps running.

Firmware changes to that controller are intentionally harder than content or planner changes. The Firmware Change Protocol section documents the replay, invariant, OTA, and rollback gates used before changing the ESP32 path.

Safety Boundary

AI planning agent

Chooses bounded tactics from forecast, scorecards, lessons, and physical constraints.

MCP

Accepts only registry-approved writes with trigger and planner-instance metadata.

Dispatcher

Clamps, pushes, and confirms setpoints through firmware cfg_* readbacks.

ESP32 firmware

Runs the priority-ordered local controller and owns physical relay state.

Telemetry

Exposes whether the physical result matched the plan and safety boundary.

That split is enforced through explicit ownership:

Crop policyBiology

Defines the temperature and VPD bands the plants should experience. The crop catalog owns crop-specific facts.

AI planning agentTactics

Writes bounded setpoints and a measurable hypothesis through MCP tools only.

DispatcherDelivery

Rejects unknown or out-of-range values, pushes changes over the ESPHome native API, and records readback status.

ESP32 firmwareActuation

Evaluates trusted greenhouse state locally and controls fans, heat, fog, misters, vent, lights, and pumps.

TelemetryEvidence

Records whether the plan, dispatcher, firmware, and equipment produced the intended physical outcome.

Firmware Safety Shape

The current controller is a priority-ordered firmware state machine, not an LLM loop. The public state claim follows firmware/lib/greenhouse_types.h: seven non-idle safety/control states plus IDLE, with relay outputs resolved separately.

SENSOR_FAULTBad or missing core input

Fail closed when the controller cannot trust the climate reading.

SAFETY_COOLHard high-temperature rail

Emergency cooling shape preempts normal planning.

SAFETY_HEATHard low-temperature rail

Emergency heating shape preempts normal planning.

SEALED_MISTVPD recovery

Humidification posture when the greenhouse needs moisture more than ventilation.

THERMAL_RELIEFHeat flush

Transient vent/fan relief so sealed misting cannot trap heat indefinitely.

VENTILATECooling and exchange

Fan/vent posture when heat rejection or outdoor exchange is the right tradeoff.

DEHUM_VENTHumidity dump

Ventilation branch when VPD is below the safe band and excess humidity needs to leave.

IDLEStable state

No-action branch when readings are trusted and the greenhouse is inside the safe operating band.

Safety rails and transient relief preempt normal tuning. The planner can make the controller more or less aggressive inside allowed ranges. It cannot remove the firmware’s hard boundaries.

Those allowed ranges and defaults are documented in AI Tunables Traceability. Plant wetting has another firmware permission gate tied to the biological activity window, zone offsets, drydown holds, and minimum temperature. The operating policy for that gate lives on Operations.

Firmware Change Protocol

The ESP32 is the only controller that keeps acting when the planner, ingestor, network, or cloud services are unavailable. It runs the greenhouse loop locally and keeps the latest accepted setpoints and override logic. That makes firmware changes different from website or planner changes: the first bad proof cannot be “watch production and see what happens.”

The Firmware Change Protocol is the current review and deploy discipline for that boundary. It answers four questions before a build is allowed near the controller:

Does the native control logic still pass its pinned decision tests?
Does replay show the same mode, relay, and mist decisions unless divergence was intentionally approved?
Is the greenhouse in a safe operational state for OTA?
Did the flashed controller come back with the expected version and live telemetry?

This section owns firmware change discipline. Runtime health, relay-boundary behavior, and full system flow stay on their canonical pages: Operations, Safety Architecture, and System Architecture.

Bench Gates Before Review

make test-firmware compiles the native C++ control tests and the replay harness on the development host. This is the fast gate for decision logic in firmware/lib/greenhouse_logic.h: mode selection, relay resolution, override flags, hysteresis, staging, and fault handling. The same rule applies to every test: given these sensors and setpoints, the controller must make this decision.

make firmware-invariants runs the invariant suite against the replay corpus. These are not product claims; they are regression guards for behavior that must never be broken silently.

make firmware-replay-worktree OLD=<ref> compares an uncommitted candidate against a committed baseline. Once both sides are committed refs, make firmware-replay OLD=<base> NEW=HEAD runs the dual-ref replay diff. The diff is column-aware: mode, relay, and mist-stage divergence is a hard review item, while diagnostic-only changes are reported separately. The default allowed divergence is zero. Planned behavior changes require an explicit threshold override and coordinator review.

make firmware-check compiles the ESPHome firmware from the current worktree. Passing native tests is not enough; the generated firmware must still compile as the ESP32 build.

Required PR Evidence

Firmware work is PR-scoped. A firmware PR carries the replay diff output, invariant-suite output, and unit-test delta. Interface-level changes also need planner concurrence, because firmware semantics are part of the planner contract.

The replay gate exists because unit tests are too narrow for control firmware. A hand-built test can prove one branch works. Replay can show whether a changed branch alters relay decisions across recorded greenhouse states, or whether an override path that should exist has gone quiet. That is why the PR artifact is the diff, not only a green test line.

OTA Preflight

make firmware-deploy starts with scripts/firmware-deploy-preflight.sh. The preflight blocks OTA when unresolved critical or legacy-high alerts are open, when the rollback artifact is missing, when the 48-hour bake gate is not satisfied, or when the weekly OTA limit has already been used. A forecast above 85°F is reported as operator context, not as an automatic block.

Operator overrides exist for exceptional cases, but they require an explicit reason and signoff. A dirty worktree deploy is also refused unless it is marked as an operator-approved emergency. The default path is a clean, reviewed build from the repo.

Deploy Acceptance And Rollback

After preflight, make firmware-deploy compiles the firmware with a version string, uploads it over OTA, waits for reboot, and then waits for the diagnostics stream to report the expected firmware version. The deploy then runs make sensor-health SINCE='5 minutes'.

The sensor-health sweep checks fresh climate values, Modbus timeout distribution, active probe count, ESP32 diagnostics, firmware version, new deploy-window alerts, and unexpected override-event storms. Critical failures roll the ESP32 back to firmware/artifacts/last-good.ota.bin and rerun health validation on the rolled-back firmware.

Passing sensor health accepts the deploy and archives the build outputs. It does not immediately promote that build to last-good.ota.bin; the rollback target is promoted explicitly after the bake window with make firmware-promote-last-good FW_VERSION=<archived-version>.

What The Gates Prove

Native tests prove pinned control decisions still hold.
Replay diff proves the candidate did not change actuator behavior across the replay corpus unless that divergence was reviewed.
Invariants prove the replay corpus did not breach hard safety and stability constraints.
ESPHome compile proves the actual firmware build still produces an ESP32 image.
Deploy preflight proves the greenhouse is not already in a severe alert state and that rollback capacity exists.
Sensor health proves the flashed controller is reporting the expected version and the required telemetry is live.

None of those gates prove the plants are optimized. Longer-term plant response, band performance, water use, and energy behavior belong to the operations and evidence pages. The protocol is narrower: it keeps firmware changes from becoming silent production experiments.

Command Surface

make test-firmware                         # native control tests + replay harness
make firmware-invariants                   # invariant suite against replay corpus
make firmware-replay-worktree OLD=<ref>    # committed baseline vs current worktree
make firmware-replay OLD=<base> NEW=HEAD   # dual-ref replay diff
make firmware-check                        # ESPHome compile only
make firmware-deploy                       # preflight, OTA, version wait, sensor health, rollback on failure
make firmware-promote-last-good FW_VERSION=<archived-version>
make sensor-health SINCE='5 minutes'

make check includes lint, tests, lighting audit, native firmware tests, and firmware compile. Firmware PRs still carry the dedicated replay and invariant artifacts because review needs the actual behavior delta, not just a pass/fail status.

What Happens When Things Fail

Planner outagePlanner stops writing new plans

The ESP32 keeps enforcing the most recent valid setpoints and safety rails. The public plan archive shows the gap.

Wi-Fi or network interruptionSetpoints may stop updating

Local firmware continues running. Public data freshness degrades visibly through the data-health cards.

Bad planner valueClamp or rejection

The dispatcher and tunable registry constrain values before they reach the controller. Readbacks and clamp logs make silent failures visible.

Sensor faultConfidence loss

Firmware and alerting favor safe shapes when core readings go stale or invalid.

Dispatcher or DB outageNo fresh delivery

The ESP32 continues on last confirmed values; freshness and readback checks make the gap visible before new planning claims are trusted.

ESP32 rebootFirmware defaults first

Boot defaults are bounded by safety rails; dispatcher readbacks show whether the live controller has recovered the active plan.

Relay or actuator faultCommanded state may not equal physical effect

Equipment-state telemetry, cycle counters, alerts, and operator checks expose physical systems that no longer match software intent.

Physical limitPhysics wins

If sun, outdoor humidity, vent area, or cooling capacity cannot meet the band, the scorecard records stress instead of pretending the plan worked.

Why Not Direct LLM Control?

Direct LLM relay control is the wrong abstraction for a greenhouse. Relays need deterministic timing, hysteresis, interlocks, and safety preemption. The AI planning agent is useful here because it can weigh context: forecast, crop needs, prior failures, plan memory, cost, water, equipment state, and the documented physical constraints of the room. It is not useful as a real-time relay loop.

The pattern is edge-safe and cloud-smart:

Planner intelligence

The AI planning agent uses greenhouse memory, retrieval context, forecasts, and scorecards for slow-loop decisions.

Validated tools

MCP and dispatcher checks constrain what a plan can write before it reaches firmware.

Local-safe

The ESP32 runs the deterministic control loop and owns relay state.

Auditable

Every plan is judged by measured telemetry and a public scorecard.

Next: Planning Loop shows how a bounded plan reaches the dispatcher, and AI Greenhouse FAQ answers the common objections directly.

Lab