Not fundedGrant
$0raised

Project summary

[NOTE (25 Aug 2023): I have decided to deprioritise this project in favour of Whitebox Research.]

Blackbelt is a dependency graph of exercises sourced from various alignment materials, accessible via a fully gamified app.

  • It is non-infohazardous by default: users must complete an exercise or test of commitment before they can access the resources associated with them

  • It is scalable: there is a hands-free turnover process for people verifying submissions, so that previous mentees can in turn teach newer ones; barring that, the more basic exercises can be incrementally automated

  • It is complementary to many projects: human enhancement can benefit from faster training, existing forums can use badges to quickly signal competence, and workshops can be mastered ex post facto as exercises in the graph

  • It is (eventually) epistemically virtuous: fake or shoddy tests will be weaned out by several mechanisms: a) inability to advance to further tests, b) inability to win regular public tournaments, and c) the user’s submissions serving as a semi-permanent record accessible to their peers

Project goals

As per my longer project description, Blackbelt aims to become a partial solution to the following problems:

  • Upskilling is hard: the available paths are often lonely and uncertain, workshops aren't mass-producing Paul Christianos, and it's hard for people to stay motivated over long periods of time unless they uproot their entire lives and move to London/Berkeley.

  • It takes up to five years for entrants in alignment research to build up their portfolio and do good work–too slow for short timelines.

  • LessWrong–and by extension greenfield alignment–is currently teetering on the edge of an Eternal September: most new people are several hundred thousand words of reading away from automatically avoiding bad ideas, let alone being able to discuss them with good truth-seeking norms.

  • We don't have a reliable way to gauge the potential of someone we've never met to do great work.

The theory of change is as follows: we need a Manhattan Project-level of commitment to the alignment problem because timelines are so short, but barring that there ought to be a way to replicate that experience (incl. the sourcing of talent from all over) and massively improve not only the quality of alignment research output, but also the volume and heterogeneity.

In the limit of this project succeeding, the following must become true:

  1. The community has converged on a graph of exercises that operationalise what it means to be good at alignment research.

  2. As such, it has become a Schelling point for most upskilling and mentorship activities in the community.

  3. A >40% increase in topnotch alignment research done outside of London and Berkeley; 6-8% of alignment grantees in a year having never networked at an EA Global event.

How will this funding be used?
Minimal

$9,000. This would cover four months of my living expenses so I can pursue this full-time, then the rest will be spent on taxes, legal overhead and hosting costs.

Mainline

$12,000. Everything I mentioned above, plus another dev. We could probably stretch this to six months if we’re smart with living costs.

Comfortable

$24,000. This would allow me to work on this for an entire year, plus leave the possibility of flying to London/Berkeley so I can shadow alignment researchers and build a graph of exercises for them to ease their mentorship burden. In addition, this would embolden me to collaborate with people doing courses and workshops (e.g. BlueDot Impact) so I can start building out the community’s interstitial knowledge infrastructure.

What is your (team's) track record on similar projects?

Alignment: I attended SERI MATS remotely under John Wentworth last year, and I also facilitated a cohort in the most recent round of AGISF. In 2021, I worked as a computer vision researcher for a video compression startup in SF, and I have been reading up on applied rationality via LessWrong/Overcoming Bias since 2009 (I was invited to the 2020 CFAR workshop but could not attend for obvious reasons).

Entrepreneurship: In 2015, I co-founded one of the first VR/AR startups in the Philippines and built an enterprise crew training product used by the largest airline in the country (Philippine Airlines), as well as bespoke experiences for large brands like SM Cyberzone and Calbee.

Community building: I also co-founded the largest local VR/AR community (now XR Philippines) around the same time, securing key partners and strategically doing campaigns that within seven months, we nabbed an interview on national TV to promote one of our hackathons.

How could this project be actively harmful?

Security breaches could leak sensitive dual-use research, or the app becomes too engaging that it wastes people’s time.

What other funding is this person or project getting?

None.

AntonMakiievskyi avatar

Anton Makiievskyi

over 1 year ago

I'd love if such a project could happen, but I find it hard to believe it's possible.

I see how skill tree may be developed for a frequently sought after and well described skill, e.g. "making a pitch deck", or "writing a SQL query".

"Alignemnt" appear to not be that well defined yet

zrkrlc avatar

Clark Urzo

over 1 year ago

I understand where you're coming from, but even alignment papers have to start from existing knowledge. To write a mechanistic interpretability paper needs mastery of certain aspects of linear algebra, Python, and some proficiency in avoiding anthropocentric explanations for observed phenomena. Singular learning theory needs algebraic geometry and statistical learning theory. Agent foundations require non-standard decision theory and applications of logic and probability theory.

I expect these prerequisites to not only deepen over time as alignment becomes more and more paradigmatic but also to lead to new, distinct techniques that may be hard to learn without having to be personally mentored by a researcher in Berkeley. And researcher time is valuable, so the sooner we can get the knowledge transfer problem off their hands, the better.

zrkrlc avatar

Clark Urzo

over 1 year ago

@zrkrlc *anthropomorphic