Sergiu Hart / papers / "Calibeating": Beating Forecasters at Their Own Game

"Calibeating": Beating Forecasters at Their Own Game

Dean P. Foster and Sergiu Hart

(TE, SHORT version) Errata (*)		(Arxiv, LONG version) Errata (**)		"Stronger expert" (see also below)
(Acrobat PDF files)

Abstract

In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.

First version: February 2020
The Hebrew University of Jerusalem, Center for Rationality DP-743, October 2021
Revised, May 2022
Revised, October 2022: arXiv http://arxiv.org/abs/2209.04892v2
Theoretical Economics 18 (2003), 4, 1441-1474
doi.org/10.3982/TE5330

(*) Errata

Page 1450, in the 7th line after formula (6), replace \( ~~\phi : B \rightarrow B ~~\) with \[\phi : B \rightarrow \Delta(A) \]
Page 1472, in the displayed formula in Remark (b), replace \( ~~\bar{a}^n_{t-1}~~ \) with \[\bar{a}^n_{i,t-1}\] (the subscript i is missing)

(**) Errata

Page 38, formula (43): on the righthand side insert \( ~~max_{1\le n \le N}~~ \) between \( ~~\gamma ^2 ~~\) and \( ~~|B^n|~~ \)

(***) ADDENDUM: Multi-Calibeating Beats the Stronger Expert

We show here that calibeating is a stronger notion than the so-called "stronger expert."

As shown in the paper (see (2) and the last paragraph of Section 2), the refinement score is the minimal Brier score over all relabelings of the bins; i.e., \[ \mathcal{R}_{t}=\min_{\phi }\mathcal{B}_{t}^{\phi (\mathbf{c})}, \] where the minimum is taken over all functions \(\phi :C\rightarrow C\) (from current labels to new labels), and we write \(\mathcal{B}_{t}^{\phi (\mathbf{c})}\) for the Brier score where the sequence \(\mathbf{c}\) is replaced by \(\phi (\mathbf{c})=(\phi (c_{s}))_{1\leq s=1,2,...}\) .

Therefore, taking \(C=\Delta(A)\), we have:

\(\mathbf{c}\) is multi-calibeating \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}_{t}^{\mathbf{c}}\leq\min_{\phi}\mathcal{B}_{t}^{\phi(\mathbf{b}_{1},...,\mathbf{b}_{N})} +o(1)\] (as \(t\rightarrow\infty\); we ignore the uniformity on \(\mathbf{a}\)), where the minimum is taken over all functions \(\phi:\Pi_{n=1}^{N}B^{n}\rightarrow\Delta(A).\)

By comparison, when all forecasts are probability distributions on \(A,\) i.e., all \(B^{n}\) are subsets of \(\Delta(A)\), Foster (1991) defines (see (#) below):

\(\mathbf{c}\) is as strong as \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}% _{t}^{\mathbf{c}}\leq\min_{1\leq n\leq N}\mathcal{B}_{t}^{\mathbf{b}_{n}% }+o(1)\]
\(\mathbf{c}\) is as strong as the convex hull of \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}% _{t}^{\mathbf{c}}\leq\min_{w}\mathcal{B}_{t}^{w_{1}\mathbf{b}_{1}% +...w_{N}\mathbf{b}_{N}}+o(1),\] where the minimum is taken over all \(w=(w_{1},...,w_{N})\) with \(w_{n}\geq0\) and \(\sum_{n=1}^{N}w_{n}=1.\)

Whereas multi-calibeating takes into account all bin relabelings, stronger-experts notions do not go beyond linear-combination relabelings. Therefore, as claimed,

Calibeating is stronger than being the "stronger expert."

(#) In the expansive literature on experts, these notions are referred to as "prediction with no regret"; see Appendix A.9 of the full paper for the parallel results with the logarithmic scoring rule.

"Calibeating": Beating Forecasters at Their Own Game Dean P. Foster and Sergiu Hart

"Calibeating": Beating Forecasters at Their Own Game

Dean P. Foster and Sergiu Hart