Baseline vs AI Planning Agent
This is an operational comparison, not a controlled A/B test, and it does not isolate the AI planning agent as the only variable. The baseline window is the April 22-25, 2026 planner-offline run already documented in the public outage story. The comparison window is April 26-May 2, 2026, when normal AI-assisted planning resumed.
The comparison is still useful because it answers the skeptical reader’s first question: when the planning loop is online, do the public scorecards look different from the period where the ESP32 had to keep running without normal AI plans?
For the exact parameters the AI planning agent can change when it is online, see AI Tunables Traceability.
For the broader caveat language, see the AI Greenhouse FAQ. For the live receipts behind this page, use the Scorecard, Operations, and the planning archive.
Periods Compared
Planner offline window: 4 days, 0.0 AI planning-agent plans/day.
Planner online window: 7 days, 3.0 AI planning-agent plans/day.
Summary Table
| Metric | Planner offline | Planner online | Change |
|---|---|---|---|
| Average AI planning-agent plans/day | 0.0 | 3.0 | +3.0 |
| Both-axis compliance (binary, house) | 20.6% | 54.7% | +34.1 pts |
| Temperature compliance | 44.9% | 70.9% | +26.0 pts |
| VPD compliance | 30.3% | 74.5% | +44.2 pts |
| Graded compliance (controller-attributable) | 63.4% | 67.3% | +3.9 pts |
| Cumulative stress-axis hours/day | 29.9h | 13.1h | 16.8h lower |
| Water/day | 429.6 gal | 240.1 gal | 189.5 gal lower |
| Runtime-modeled electric energy/day | 19.1 kWh | 29.1 kWh | 10.0 kWh higher |
| Cost/day | USD 7.89 | USD 8.31 | USD 0.42 higher |
| Planner score | 25.9 | 52.7 | +26.8 |
Confounders To Keep In View
This comparison is useful, but it is not weather-normalized proof that the AI planning agent caused every improvement. The planner-online window was cooler, more humid, and lower-solar on average, which likely made VPD and heat stress easier. The table below makes those confounders explicit instead of burying them in caveats.
| Factor | Planner offline | Planner online | Reading |
|---|---|---|---|
| Outdoor temperature | 55.9°F avg / 83.6°F max | 47.8°F avg / 73.5°F max | The planner-online window was cooler, reducing heat-load pressure. |
| Outdoor VPD / humidity | 1.19 kPa avg | 0.55 kPa avg | The planner-online window had less dry-air pressure, so VPD compliance was easier to recover. |
| Solar irradiance | 262 W/m² avg | 205 W/m² avg | Lower solar load reduces overheating and evaporative demand. |
| Manual interventions | 0 logged crop events | 0 logged crop events | Logged event counts are shown explicitly, but operator activity is not controlled like a lab experiment. |
| Hardware changes | No major hardware change is asserted here | No major hardware change is asserted here | The comparison uses the same greenhouse and controller boundary, but it is not a locked hardware trial. |
| Crop mix / active bands | Same public crop-control model | Same public crop-control model, plants still aging | Crop targets are comparable enough for an operational receipt, not for yield attribution or agronomic proof. |
How To Read The Receipt
This page uses generated tables instead of re-embedding the Scorecard dashboard. The comparison window is fixed, the metrics are pulled from the same tables behind the public scorecard, and the caveats stay on the page with the numbers.
For the live rolling charts, use the Scorecard.
Resource Tradeoffs
The comparison is more useful when stress is shown beside what the greenhouse spent trying to reduce it. Cost, water, and misting are not success metrics by themselves. They matter because an AI planning agent can improve the headline score only if it reduces plant stress without hiding the resource bill.
Daily Rows
| Date | Plans | Both-axis | Graded (attrib.) | Score | Stress | VPD-high | Heat | Cost |
|---|---|---|---|---|---|---|---|---|
| 2026-04-22 | 0 | 30.9% | 60.9% | 36.5 | 26.2h | 12.8h | 9.4h | USD 6.14 |
| 2026-04-23 | 0 | 19.2% | 70.8% | 23.1 | 27.8h | 15.6h | 2.0h | USD 9.16 |
| 2026-04-24 | 0 | 3.0% | 57.5% | 11.2 | 40.8h | 22.5h | 7.1h | USD 8.43 |
| 2026-04-25 | 0 | 29.2% | 64.3% | 32.9 | 24.7h | 12.7h | 8.7h | USD 7.82 |
| 2026-04-26 | 4 | 39.7% | 66.3% | 39.8 | 16.3h | 2.7h | 4.3h | USD 8.97 |
| 2026-04-27 | 2 | 24.7% | 60.6% | 32.3 | 23.6h | 4.6h | 1.9h | USD 5.58 |
| 2026-04-28 | 3 | 53.4% | 65.5% | 54.4 | 14.3h | 4.5h | 3.2h | USD 6.21 |
| 2026-04-29 | 3 | 73.7% | 70.5% | 67.7 | 6.9h | 3.3h | 0.7h | USD 8.44 |
| 2026-04-30 | 2 | 69.2% | 73.5% | 60.7 | 7.4h | 4.8h | 0.0h | USD 10.97 |
| 2026-05-01 | 2 | 65.6% | 64.2% | 62.0 | 9.5h | 5.3h | 1.4h | USD 7.86 |
| 2026-05-02 | 5 | 56.6% | 70.6% | 51.8 | 13.7h | 7.2h | 3.3h | USD 10.13 |
Definitions
daily_summary.compliance_pctPercent of samples where the house-average temperature and VPD were both inside the single served control band. This is a binary, house-level pass/fail: a reading 0.1°F out of band scores the same as one 15°F out, and it grades against the served band rather than per-zone agronomic targets. It does not assert that every zone, or every plant, was inside a firmware-enforced band.
daily_summary.compliance_v2_attributable_pctA graded, per-zone, feasibility-aware compliance figure (band-compliance design §6-§7). It gives full credit inside the ideal band, linear partial credit through the stress band, and zero beyond, aggregated across occupied zones by priority weight; misses the controller could not physically prevent (for example, an exhaust-only box that cannot cool below outdoor air) are credited as controller-attributable. It is reported as context and is populated once the graded compliance engine is promoted; until then this column shows a dash.
Summed daily stress duration from corrected daily summary fields. This is not capped at one stress type; a hot-dry hour can count on more than one axis.
v_planner_performance.planner_scoreComposite score: 80% compliance and 20% cost efficiency. It is useful as an operational KPI, not as a yield claim.
daily_summary.kwh_estimatedElectric energy from published equipment wattage multiplied by observed on-time; metered kWh is retained separately as diagnostic evidence.
Resource spend comes from estimated daily summary fields unless marked measured. The greenhouse is solar-aligned but still uses grid electricity and gas heat.
Caveats
- Weather, crop load, hardware state, and operator activity were not identical across the two windows.
- The baseline is a real outage window, not a hand-picked fixed-rule controller experiment.
- The strongest claim is not that the AI planning agent guarantees better outcomes every day. The useful claim is that the system makes planner availability, physical stress, cost, and score visible enough to audit.
- This is not a yield, profit, or controlled-trial claim. It is a public operational receipt; see the AI Greenhouse FAQ for the claim boundary.
- Known physical and instrumentation limits still apply, including weather, sensor coverage, water attribution, and firmware-change risk. See Climate Control, Resource Use, and Safety Architecture.
Reproducibility
This page is generated by scripts/generate-baseline-vs-iris-page.py from daily_summary, plan_journal, v_planner_performance, climate, and crop_events.
For raw public-safe data, use the 7-day climate CSV, 30-day plan outcomes CSV, and dataset notes. The current public snapshot is available from the evidence snapshot API.
Where To Go Next
- Why the AI Does Not Control Relays explains the safety split behind the outage window.
- Planning Loop shows how the AI planning agent writes hypotheses and waypoints.
- AI Tunables Traceability lists the bounded control surface behind those waypoints.
- Scorecard shows the live scorecard and forecast-plan-outcome panels.
- Generated Lessons shows what the planner reads before future plans.
- Data Model explains the tables, views, and sample exports behind this comparison.