Superforecaster predictions of long-term impacts

Project description

I will pay 30 forecasters with excellent track records on Good Judgement Open, Metaculus and Manifold (superforecasters) to make forecasts of long-run treatment effects from randomised controlled trials. This will allow me to provide new evidence on two critical questions in the forecasting space (1) how does forecast accuracy decay with time horizon?, and (2) when are (super)forecasters better than domain experts?

Why forecasts of long-term treatment effects from randomised controlled trials (RCTs)? Firstly, most research on forecasting is about ‘state’ forecasts, what the world will look like in the future. More relevant for those seeking to improve the world are ‘impact’ (or causal) forecasts, the difference between what would happen if we take action X and what would happen if we did not take action X. The treatment effects of RCTs are causal impacts and by collecting forecasts of them I contribute to this understudied area of forecasting.

Secondly, using RCTs allows us to resolve long-run forecasts more quickly. I will collect forecasts for the 5-10 year results from 7 different RCTs. These RCTs are already underway and the long-run results will be available to me in spring 2023 so I will be able to resolve the long-run forecasts soon. However, the only information that is available about the RCTs is a short-run set of results, typically observed 2 years after each RCT started. As such, if the long-run results are from year 10, the long-run forecast of these results approximates an 8-year forecast but resolves much more quickly. It is not possible for the forecasters to know anything about what happened in each RCT between years 2 and 10, so the forecast is a real long-run forecast.

Why care about question (1) how does forecast accuracy decay with time horizon? Firstly, it’s important to know how much we can trust long-range forecasts in a variety of domains when we’re making policies and decisions with long-run impacts. Secondly, a common objection to longtermism is that the effect of our actions on the long-term future are essentially impossible to predict. Thus, despite the huge potential value in the future, extreme uncertainty around long-term impacts means that the expected value of our options is mostly determined by their short-run impacts. However, there is limited empirical evidence on this question and my study will generate relevant and important information for this crucial consideration.

Why care about question (2) when are (super)forecasters better than domain experts? Tetlock’s research shows that in geopolitics, teams of superforecasters are better than other prediction mechanisms, such as domain experts. However, there are very few studies explicitly comparing experts to forecasters without domain expertise, across any domain. In other important areas, such as economics, we might expect the greater domain knowledge of experts to better compensate for their lack of experience in forecasting. In general, we need more research in a variety of domains to understand how much we should trust domain experts versus forecasters. I already have many forecasts from academic economists and development practitioners with domain expertise, so I just need forecasts from superforecasters to be able to make this comparison.

For existing research on question (1) see: Charles Dillon, Data on forecasting accuracy across different time horizons and levels of forecaster experience; Niplav, Range and Forecasting Accuracy; Javier Prieto, How accurate are Open Phil’s predictions?; Luke Muelhauser, How Feasible Is Long-range Forecasting? For existing research on question (2) see: Gavin Leech & Misha Yagudin, Comparing top forecasters and domain experts.

What is your track record on similar projects?

I am already running surveys on the Social Science Prediction Platform (https://socialscienceprediction.org/predict/) and receiving forecasts from academics, practitioners and laypeople (the latter recruited via Prolific). The surveys have been well received with one respondent, a professor of economics at Stanford saying it was “cool research” and a “really interesting idea”. The superforecasters will be able to take these same surveys so no additional work will be required to design and create new surveys.

This project is part of my PhD in which I have a similar research project on how to use supervised machine learning to estimate long-term treatment effects in cases where we don’t have data on long-term outcomes. For this other project, I won the best paper prize at the Global Priorities Institute’s Early Career Conference Program and presented at EA Global 2019. This demonstrates my ability to make useful empirical and methodological contributions to forecasting and global priorities research.

Paper - https://globalprioritiesinstitute.org/david-rhys-bernard-estimating-long-term-treatment-effects-without-long-term-outcome-data/
EAG presentation - https://www.youtube.com/watch?v=mOufR9vFO_U
Presentation transcript - https://www.effectivealtruism.org/articles/david-rhys-bernard-estimating-long-term-effects-without-long-term-data/

How will you spend your funding?

I will pay 30 superforecasters $50 per hour to make forecasts on this project. I expect completing 7 surveys to take around 2 hours, so the total cost will be $3,000 (30 forecasters, for 2 hours at $50 per hour). I am not asking for any money for support of living costs or personal expenses as part of this funding. I will recruit forecasters by reaching out to Good Judgement, Metaculus, the Forecasting Research Institute, and personal connections.

If you have a strong track record on Good Judgement Open, Metaculus or Manifold and are interested in making forecasts for this project, please get in touch: david.rhys.bernard@gmail.com