Baseline vs AI Planning Agent

This is an operational comparison, not a controlled A/B test, and it does not isolate the AI planning agent as the only variable. The baseline window is the April 22-25, 2026 planner-offline run already documented in the public outage story. The comparison window is April 26-May 2, 2026, when normal AI-assisted planning resumed.

The comparison is still useful because it answers the skeptical reader’s first question: when the planning loop is online, do the public scorecards look different from the period where the ESP32 had to keep running without normal AI plans?

For the exact parameters the AI planning agent can change when it is online, see AI Tunables Traceability.

For the broader caveat language, see the AI Greenhouse FAQ. For the live receipts behind this page, use the Scorecard, Operations, and the planning archive.

Periods Compared

2026-04-22 to 2026-04-25

Planner offline window: 4 days, 0.0 AI planning-agent plans/day.

2026-04-26 to 2026-05-02

Planner online window: 7 days, 3.0 AI planning-agent plans/day.

Summary Table

MetricPlanner offlinePlanner onlineChange
Average AI planning-agent plans/day0.03.0+3.0
Both-axis compliance (binary, house)20.6%54.7%+34.1 pts
Temperature compliance44.9%70.9%+26.0 pts
VPD compliance30.3%74.5%+44.2 pts
Graded compliance (controller-attributable)63.4%67.3%+3.9 pts
Cumulative stress-axis hours/day29.9h13.1h16.8h lower
Water/day429.6 gal240.1 gal189.5 gal lower
Runtime-modeled electric energy/day19.1 kWh29.1 kWh10.0 kWh higher
Cost/dayUSD 7.89USD 8.31USD 0.42 higher
Planner score25.952.7+26.8

Confounders To Keep In View

This comparison is useful, but it is not weather-normalized proof that the AI planning agent caused every improvement. The planner-online window was cooler, more humid, and lower-solar on average, which likely made VPD and heat stress easier. The table below makes those confounders explicit instead of burying them in caveats.

FactorPlanner offlinePlanner onlineReading
Outdoor temperature55.9°F avg / 83.6°F max47.8°F avg / 73.5°F maxThe planner-online window was cooler, reducing heat-load pressure.
Outdoor VPD / humidity1.19 kPa avg0.55 kPa avgThe planner-online window had less dry-air pressure, so VPD compliance was easier to recover.
Solar irradiance262 W/m² avg205 W/m² avgLower solar load reduces overheating and evaporative demand.
Manual interventions0 logged crop events0 logged crop eventsLogged event counts are shown explicitly, but operator activity is not controlled like a lab experiment.
Hardware changesNo major hardware change is asserted hereNo major hardware change is asserted hereThe comparison uses the same greenhouse and controller boundary, but it is not a locked hardware trial.
Crop mix / active bandsSame public crop-control modelSame public crop-control model, plants still agingCrop targets are comparable enough for an operational receipt, not for yield attribution or agronomic proof.

How To Read The Receipt

This page uses generated tables instead of re-embedding the Scorecard dashboard. The comparison window is fixed, the metrics are pulled from the same tables behind the public scorecard, and the caveats stay on the page with the numbers.

For the live rolling charts, use the Scorecard.

Resource Tradeoffs

The comparison is more useful when stress is shown beside what the greenhouse spent trying to reduce it. Cost, water, and misting are not success metrics by themselves. They matter because an AI planning agent can improve the headline score only if it reduces plant stress without hiding the resource bill.

Daily Rows

DatePlansBoth-axisGraded (attrib.)ScoreStressVPD-highHeatCost
2026-04-22030.9%60.9%36.526.2h12.8h9.4hUSD 6.14
2026-04-23019.2%70.8%23.127.8h15.6h2.0hUSD 9.16
2026-04-2403.0%57.5%11.240.8h22.5h7.1hUSD 8.43
2026-04-25029.2%64.3%32.924.7h12.7h8.7hUSD 7.82
2026-04-26439.7%66.3%39.816.3h2.7h4.3hUSD 8.97
2026-04-27224.7%60.6%32.323.6h4.6h1.9hUSD 5.58
2026-04-28353.4%65.5%54.414.3h4.5h3.2hUSD 6.21
2026-04-29373.7%70.5%67.76.9h3.3h0.7hUSD 8.44
2026-04-30269.2%73.5%60.77.4h4.8h0.0hUSD 10.97
2026-05-01265.6%64.2%62.09.5h5.3h1.4hUSD 7.86
2026-05-02556.6%70.6%51.813.7h7.2h3.3hUSD 10.13

Definitions

Both-axis compliance (binary, house)daily_summary.compliance_pct

Percent of samples where the house-average temperature and VPD were both inside the single served control band. This is a binary, house-level pass/fail: a reading 0.1°F out of band scores the same as one 15°F out, and it grades against the served band rather than per-zone agronomic targets. It does not assert that every zone, or every plant, was inside a firmware-enforced band.

Graded compliance (controller-attributable)daily_summary.compliance_v2_attributable_pct

A graded, per-zone, feasibility-aware compliance figure (band-compliance design §6-§7). It gives full credit inside the ideal band, linear partial credit through the stress band, and zero beyond, aggregated across occupied zones by priority weight; misses the controller could not physically prevent (for example, an exhaust-only box that cannot cool below outdoor air) are credited as controller-attributable. It is reported as context and is populated once the graded compliance engine is promoted; until then this column shows a dash.

Cumulative stress-axis hours/dayHeat + cold + VPD-high + VPD-low

Summed daily stress duration from corrected daily summary fields. This is not capped at one stress type; a hot-dry hour can count on more than one axis.

Planner scorev_planner_performance.planner_score

Composite score: 80% compliance and 20% cost efficiency. It is useful as an operational KPI, not as a yield claim.

Runtime-modeled electric energy/daydaily_summary.kwh_estimated

Electric energy from published equipment wattage multiplied by observed on-time; metered kWh is retained separately as diagnostic evidence.

Cost/dayElectric + gas + water

Resource spend comes from estimated daily summary fields unless marked measured. The greenhouse is solar-aligned but still uses grid electricity and gas heat.

Caveats

  • Weather, crop load, hardware state, and operator activity were not identical across the two windows.
  • The baseline is a real outage window, not a hand-picked fixed-rule controller experiment.
  • The strongest claim is not that the AI planning agent guarantees better outcomes every day. The useful claim is that the system makes planner availability, physical stress, cost, and score visible enough to audit.
  • This is not a yield, profit, or controlled-trial claim. It is a public operational receipt; see the AI Greenhouse FAQ for the claim boundary.
  • Known physical and instrumentation limits still apply, including weather, sensor coverage, water attribution, and firmware-change risk. See Climate Control, Resource Use, and Safety Architecture.

Reproducibility

This page is generated by scripts/generate-baseline-vs-iris-page.py from daily_summary, plan_journal, v_planner_performance, climate, and crop_events.

For raw public-safe data, use the 7-day climate CSV, 30-day plan outcomes CSV, and dataset notes. The current public snapshot is available from the evidence snapshot API.

Where To Go Next