Retroactive funding for Don't Dismiss Simple Alignment Approaches

Project summary

Retroactive funding for the Less Wrong post Don't Dismiss Simple Alignment Solutions.

I believe that this project is especially suitable for retroactive funding since:

• It was only published recently (the further back you go, the less value is delivered by retroactive funding)
• It's much easier to judge the post as a success retroactively than it would have been prospectively
• My runway is currently short enough that it is likely to lead to counterfactual AI Safety work taking place as opposed to just increasing a figure in a bank account.

This post wasn't a huge amount of work to write up, but producing posts of this EV regularly (say, one each month) would be extremely challenging. See, for example, the large number of posts that I had to write to get this one hit: https://www.alignmentforum.org/users/chris_leong, with more posts on Less Wrong (that said, many of those posts were on niche topics, so not intended to be hits). So it would likely make sense to view it as more of a grant for achieving a hit, rather than a grant for this one specific post.

What are this project's goals and how will you achieve them?

• Encourage more work of the sort in the post - either investigation of linearity or other ways it may be surprisingly easy to make progress on alignment.
• Encourage more people to give technical alignment a go, rather than just throwing up their hands.

Hopefully, the post has received enough attention to achieve these.

If there is > 2% chance that the post leads to someone producing a counterfactual result on the same level as those mentioned in the post, then I expect the post will be worth more than the amount requested.

How will this funding be used?

Project is already complete, so it would increase my runway.

Who is on your team and what's your track record on similar projects?

Irrelevant since a retroactive request. That said, I'm open to improving/augmenting the post if requested.

What are the most likely causes and outcomes if this project fails? (premortem)

Maybe this post was unnecessary as Beren already wrote a post "Deep learning models might be secretly (almost) linear" (https://www.lesswrong.com/posts/JK9nxcBhQfzEgjjqe/deep-learning-models-might-be-secretly-almost-linear). I think there's value in my framing (better optimised as a call for action).

Often posts get attention and then are forgotten.

Joseph Bloom suggests in a comment that not everyone might be excited about these alignment proposals as I am and that this may cause people to doubt the conclusion.

Perhaps all the low-hanging fruit has already been picked. Perhaps we already have directions to investigate and we should invest in them instead of trying to open up new directions.

Because of the stance (alignment is easier than you think, rather than harder) the post was more likely to have become popular and hence might be overrated.

What other funding are you or your project getting?

I received a grant to skill up in alignment and do agent foundations research, but this grant was from a certain well-known crypto fund that collapsed, so there is a chance that it may be clawed back. This was only a part-time grant anyway, so even if it were guaranteed I would need to try to find ways to supplement it.