AI Forecasting Parity Audit
5 bundles
Frontier Model Parity Audit
Rogo's "no single best model" claim is substantially accurate but temporally bounded: on the Big Finance Bench published May 27, 2026, Claude Opus 4.7, GPT-5.5, and Claude Sonnet 4.6 were separated by less than 0.3 percentage points on a rubric-graded aggregate score, with the…
AI Forecasting Parity Audit
"Green Tree" is real but mischaracterized: Google DeepMind did submit a system codenamed "green tree" to ForecastBench, and it ranks #2 overall — but the claimed March 15, 2026 parity date is unverified; the only sourced reference to this milestone appears in a single newsletter…
AI Forecasting Parity Evidence
"Green Tree" is real but mischaracterized: Google DeepMind's "green tree" is a confirmed ForecastBench tournament submission, currently ranked #2 overall with a Brier Index of approximately 67.8–67.9 as of the May 23, 2026 leaderboard update — meaningfully below the…
AI Forecasting Parity Audit
"Green Tree" is real but mischaracterized: Google DeepMind does operate a submission labeled "green tree" on the ForecastBench tournament leaderboard, but the specific claim that it "hit parity with top human superforecasters on March 15, 2026" originates from a single…
AI Forecasting Parity Check
"Green Tree" is real but mischaracterized: Google DeepMind's "green tree" is an anonymized leaderboard entry on the Forecasting Research Institute's ForecastBench — not a standalone branded system — and the March 15, 2026 parity date cited in popular coverage appears to conflate…