Most weather APIs give you a single forecast from a single model. WeatherOracle does something different: it queries seven independent numerical weather prediction (NWP) models simultaneously, then combines them into a single probability estimate calibrated for prediction market thresholds.
This document explains the methodology — which models we use, why we chose them, how we combine their outputs, and why this approach outperforms any single model for the specific task of estimating "will the high temperature exceed X degrees?"
Prediction markets like Kalshi ask binary questions: "Will the daily high temperature in Chicago exceed 73°F?" A single weather model gives you a point estimate — say, 71.4°F — but that tells you nothing about how confident to be. Is 73°F clearly out of reach, or is it a coin flip?
The answer lies in ensemble spread. When multiple independent models agree on 71°F, the probability that it hits 73°F is genuinely low. When they disagree — one says 68°F and another says 76°F — the uncertainty is high and 73°F becomes quite plausible.
Key insight: Model disagreement is information. A 5°F spread across 7 models tells you the market is genuinely uncertain. A 1°F spread tells you the atmosphere is in a highly predictable state. WeatherOracle captures both.
All seven models are accessed for free via Open-Meteo, which caches and serves NWP output in a normalized format. Coordinates are aligned to the NWS ASOS station locations used by Kalshi and other prediction markets for official temperature observations.
Daily maximum temperatures don't follow a normal (bell curve) distribution — they follow an extreme value distribution. Specifically, the Gumbel distribution (Type I extreme value) is the theoretically appropriate model for the maximum of a large set of independent observations.
This matters because prediction markets almost always ask about extremes: "Will the HIGH exceed X?" Using a normal distribution underestimates tail probabilities. The Gumbel distribution correctly accounts for the fat right tail of daily maximum temperatures.
The scale parameter β is derived from the inter-model spread. When all 7 models agree, β is small and the distribution is tight. When models disagree significantly, β is larger and the probability mass spreads across a wider temperature range.
WeatherOracle treats each model's output as an independent observation of the true temperature distribution:
| Step | What we do | Why |
|---|---|---|
| 1. Fetch | Query all 7 models for daily maximum temperature at target coordinates | Independent data sources reduce systematic bias |
| 2. Filter | Drop any model that fails to return data | Graceful degradation — 5/7 models is still reliable |
| 3. Consensus | Compute mean and range across all available models | Equal-weighted ensemble reduces individual model error |
| 4. Gumbel fit | Fit Gumbel distribution with μ = mean, β = range/4 | Appropriate for daily temperature maxima |
| 5. Probability | Evaluate CDF at the market threshold | Direct answer to the prediction market question |
WeatherOracle returns a confidence label alongside the probability estimate:
| Label | Spread | Meaning |
|---|---|---|
| High | < 3°F | Models strongly agree. Probability estimate is reliable. |
| Medium | 3–6°F | Moderate disagreement. Estimate is useful but not precise. |
| Low | > 6°F | High model uncertainty. Use with caution. |
For prediction market applications, we recommend only acting on High or Medium confidence signals. A 7°F model spread means the atmosphere is in a chaotic state and no forecast system will be reliable.
In backtesting against NWS ASOS observations across 10 US cities, the 7-model ensemble consistently outperforms any individual model on Brier score (the standard metric for probabilistic forecast accuracy). The benefit is largest in the 2-5 day forecast window where individual models diverge most.
The ECMWF alone is excellent but has documented warm biases in certain regions. GFS is widely used but can lag on rapid pattern changes. HRRR is the most accurate for same-day forecasts but degrades quickly beyond 18 hours. By combining all seven, we get the strengths of each without the weaknesses of any single one.
Returns: mean_high_f, spread_f, prob_exceeds, confidence, all 7 model values. $0.02 USDC via x402.
View API Docs →Available cities: New York, Chicago, Los Angeles, Miami, Houston, Phoenix, Seattle, Denver, Atlanta, Boston. Data sourced from Open-Meteo free tier. Coordinates aligned to NWS ASOS stations.