Sergiu Hart / papers / "Calibeating": Beating Forecasters at Their Own Game

"Calibeating": Beating Forecasters at Their Own Game

Dean P. Foster and Sergiu Hart




(TE, SHORT version)
Corrections (*)
   

(Arxiv, LONG version)
Corrections (**)
   

"Stronger expert"
(see also below)
   
(Acrobat PDF files)

(*) Corrections
(**) Corrections
(***) ADDENDUM: Multi-Calibeating Beats the Stronger Expert

We show here that calibeating is a stronger notion than the so-called "stronger expert."

As shown in the paper (see (2) and the last paragraph of Section 2), the refinement score is the minimal Brier score over all relabelings of the bins; i.e., \[ \mathcal{R}_{t}=\min_{\phi }\mathcal{B}_{t}^{\phi (\mathbf{c})}, \] where the minimum is taken over all functions \(\phi :C\rightarrow C\) (from current labels to new labels), and we write \(\mathcal{B}_{t}^{\phi (\mathbf{c})}\) for the Brier score where the sequence \(\mathbf{c}\) is replaced by \(\phi (\mathbf{c})=(\phi (c_{s}))_{1\leq s=1,2,...}\) .

Therefore, taking \(C=\Delta(A)\), we have:

By comparison, when all forecasts are probability distributions on \(A,\) i.e., all \(B^{n}\) are subsets of \(\Delta(A)\), Foster (1991) defines (see (#) below): Whereas multi-calibeating takes into account all bin relabelings, stronger-experts notions do not go beyond linear-combination relabelings. Therefore, as claimed,

Calibeating is stronger than being the "stronger expert."


(#) In the expansive literature on experts, these notions are referred to as "prediction with no regret"; see Appendix A.9 of the full paper for the parallel results with the logarithmic scoring rule.



Abstract

In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.







   


Last modified:
© Sergiu Hart