Baseline vs AI Planning Agent

This is an operational comparison, not a controlled A/B test, and it does not isolate the AI planning agent as the only variable. The baseline window is the April 22-25, 2026 planner-offline run already documented in the public outage story. The comparison window is April 26-May 2, 2026, when normal AI-assisted planning resumed.

The comparison is still useful because it answers the skeptical reader’s first question: when the planning loop is online, do the public scorecards look different from the period where the ESP32 had to keep running without normal AI plans?

For the exact parameters the AI planning agent can change when it is online, see AI Tunables Traceability.

For the broader caveat language, see the AI Greenhouse FAQ. For the live receipts behind this page, use the Scorecard, Operations, and the planning archive.

Periods Compared

2026-04-22 to 2026-04-25

Planner offline window: 4 days, 0.0 AI planning-agent plans/day.

2026-04-26 to 2026-05-02

Planner online window: 7 days, 3.0 AI planning-agent plans/day.

Summary Table

Metric	Planner offline	Planner online	Change
Average AI planning-agent plans/day	0.0	3.0	+3.0
Both-axis compliance (binary, house)	20.6%	54.7%	+34.1 pts
Temperature compliance	44.9%	70.9%	+26.0 pts
VPD compliance	30.3%	74.5%	+44.2 pts
Graded compliance (controller-attributable)	63.4%	67.3%	+3.9 pts
Cumulative stress-axis hours/day	29.9h	13.1h	16.8h lower
Water/day	429.6 gal	240.1 gal	189.5 gal lower
Runtime-modeled electric energy/day	19.1 kWh	29.1 kWh	10.0 kWh higher
Cost/day	USD 7.89	USD 8.31	USD 0.42 higher
Planner score	25.9	52.7	+26.8

Confounders To Keep In View

This comparison is useful, but it is not weather-normalized proof that the AI planning agent caused every improvement. The planner-online window was cooler, more humid, and lower-solar on average, which likely made VPD and heat stress easier. The table below makes those confounders explicit instead of burying them in caveats.

Factor	Planner offline	Planner online	Reading
Outdoor temperature	55.9°F avg / 83.6°F max	47.8°F avg / 73.5°F max	The planner-online window was cooler, reducing heat-load pressure.
Outdoor VPD / humidity	1.19 kPa avg	0.55 kPa avg	The planner-online window had less dry-air pressure, so VPD compliance was easier to recover.
Solar irradiance	262 W/m² avg	205 W/m² avg	Lower solar load reduces overheating and evaporative demand.
Manual interventions	0 logged crop events	0 logged crop events	Logged event counts are shown explicitly, but operator activity is not controlled like a lab experiment.
Hardware changes	No major hardware change is asserted here	No major hardware change is asserted here	The comparison uses the same greenhouse and controller boundary, but it is not a locked hardware trial.
Crop mix / active bands	Same public crop-control model	Same public crop-control model, plants still aging	Crop targets are comparable enough for an operational receipt, not for yield attribution or agronomic proof.

How To Read The Receipt

This page uses generated tables instead of re-embedding the Scorecard dashboard. The comparison window is fixed, the metrics are pulled from the same tables behind the public scorecard, and the caveats stay on the page with the numbers.

For the live rolling charts, use the Scorecard.

Resource Tradeoffs

The comparison is more useful when stress is shown beside what the greenhouse spent trying to reduce it. Cost, water, and misting are not success metrics by themselves. They matter because an AI planning agent can improve the headline score only if it reduces plant stress without hiding the resource bill.

Daily Rows

Date	Plans	Both-axis	Graded (attrib.)	Score	Stress	VPD-high	Heat	Cost
2026-04-22	0	30.9%	60.9%	36.5	26.2h	12.8h	9.4h	USD 6.14
2026-04-23	0	19.2%	70.8%	23.1	27.8h	15.6h	2.0h	USD 9.16
2026-04-24	0	3.0%	57.5%	11.2	40.8h	22.5h	7.1h	USD 8.43
2026-04-25	0	29.2%	64.3%	32.9	24.7h	12.7h	8.7h	USD 7.82
2026-04-26	4	39.7%	66.3%	39.8	16.3h	2.7h	4.3h	USD 8.97
2026-04-27	2	24.7%	60.6%	32.3	23.6h	4.6h	1.9h	USD 5.58
2026-04-28	3	53.4%	65.5%	54.4	14.3h	4.5h	3.2h	USD 6.21
2026-04-29	3	73.7%	70.5%	67.7	6.9h	3.3h	0.7h	USD 8.44
2026-04-30	2	69.2%	73.5%	60.7	7.4h	4.8h	0.0h	USD 10.97
2026-05-01	2	65.6%	64.2%	62.0	9.5h	5.3h	1.4h	USD 7.86
2026-05-02	5	56.6%	70.6%	51.8	13.7h	7.2h	3.3h	USD 10.13

Definitions

Both-axis compliance (binary, house)daily_summary.compliance_pct

Percent of samples where the house-average temperature and VPD were both inside the single served control band. This is a binary, house-level pass/fail: a reading 0.1°F out of band scores the same as one 15°F out, and it grades against the served band rather than per-zone agronomic targets. It does not assert that every zone, or every plant, was inside a firmware-enforced band.

Graded compliance (controller-attributable)daily_summary.compliance_v2_attributable_pct

A graded, per-zone, feasibility-aware compliance figure (band-compliance design §6-§7). It gives full credit inside the ideal band, linear partial credit through the stress band, and zero beyond, aggregated across occupied zones by priority weight; misses the controller could not physically prevent (for example, an exhaust-only box that cannot cool below outdoor air) are credited as controller-attributable. It is reported as context and is populated once the graded compliance engine is promoted; until then this column shows a dash.

Cumulative stress-axis hours/dayHeat + cold + VPD-high + VPD-low

Summed daily stress duration from corrected daily summary fields. This is not capped at one stress type; a hot-dry hour can count on more than one axis.

Planner scorev_planner_performance.planner_score

Composite score: 80% compliance and 20% cost efficiency. It is useful as an operational KPI, not as a yield claim.

Runtime-modeled electric energy/daydaily_summary.kwh_estimated

Electric energy from published equipment wattage multiplied by observed on-time; metered kWh is retained separately as diagnostic evidence.

Cost/dayElectric + gas + water

Resource spend comes from estimated daily summary fields unless marked measured. The greenhouse is solar-aligned but still uses grid electricity and gas heat.

Caveats

Weather, crop load, hardware state, and operator activity were not identical across the two windows.
The baseline is a real outage window, not a hand-picked fixed-rule controller experiment.
The strongest claim is not that the AI planning agent guarantees better outcomes every day. The useful claim is that the system makes planner availability, physical stress, cost, and score visible enough to audit.
This is not a yield, profit, or controlled-trial claim. It is a public operational receipt; see the AI Greenhouse FAQ for the claim boundary.
Known physical and instrumentation limits still apply, including weather, sensor coverage, water attribution, and firmware-change risk. See Climate Control, Resource Use, and Safety Architecture.

Reproducibility

This page is generated by scripts/generate-baseline-vs-iris-page.py from daily_summary, plan_journal, v_planner_performance, climate, and crop_events.

For raw public-safe data, use the 7-day climate CSV, 30-day plan outcomes CSV, and dataset notes. The current public snapshot is available from the evidence snapshot API.

Where To Go Next

Why the AI Does Not Control Relays explains the safety split behind the outage window.
Planning Loop shows how the AI planning agent writes hypotheses and waypoints.
AI Tunables Traceability lists the bounded control surface behind those waypoints.
Scorecard shows the live scorecard and forecast-plan-outcome panels.
Generated Lessons shows what the planner reads before future plans.
Data Model explains the tables, views, and sample exports behind this comparison.

Lab