Wednesday, April 10, 2024

I Welcome Our Robot Overlords & You Should Too (with new preface)



A chapter from Losing My Religions.


Note, April 2024:

Good-faith feedback I've received on this chapter contends I'm too hard on humanity:

Humans don't value torturing animals and they don't make zero attempts to prevent suffering. Humans have gotten pretty good at preventing human suffering [but still -ed], although we've unfortunately backtracked on animal suffering. If you ask people, they'll say they don't want animals to suffer. They just don't think about it because they want their meat.

Most people not wanting to torture animals does not excuse the fact that humanity does torture tens of billions of sentient beings, year after year, decade after decade. 

Regardless of conscious intention, we breed these billions of individuals so they are in pain by default. And then we cram them together by the thousands, wing to wing, living and breathing in their own shit and piss, which makes their lives that much worse.  

No one argues, "AI won't consciously want to enslave us, so AI is just fine."

Intentional or not, humanity is unimaginably sadistic. The reality for untold numbers of non-human individuals is worse than humanity's worst AI nightmare.

Ask yourself: 

How much suffering would we have to cause before you question if humanity's survival is an unquestionable good?

I get it – we are human and we want to stay alive, so we have the insatiable and impenetrable bias of "humanity = good." We can't even consider that this might not be true, or else it would implicitly indict us personally.

But assuming or wishing isn't an argument. 

Or ... it shouldn't be.

Here's the full chapter:


If you don’t know what longtermism is, please skip this chapter. Yay you! 

Longtermism ≈ the interests of oodles of sentient beings / robots in the future is more important than any other concern.

I disagree.

[Longtermists believe] summing up all the possible future joy from (hopefully sentient, but probably not; how could we ever know for sure?) robots vastly and absolutely swamps any concerns of the moment. So keeping humanity on track for this future is what truly matters.

But as far as I can tell, humanity’s continued existence is not a self-evident good. I know Effective Altruists (EAs) tend to be well-off humans who like their own existence and thus personally value humanity’s existence. But this value is not inherent. It’s just a bias. It’s simply an intuition that makes EAs and others assume that humanity’s continued existence is unquestionably a good thing.

That aside, basing decisions on “add up everyone” is where I get off the EA / utilitarian train, as per the previous “Biting the Philosophical Bullet” chapter.

Yes, I understand expected values, but let’s think about what these longtermist calculations say: 

A tiny chance of lowering existential risk – a vanishingly small chance of improving the likelihood that quadzillions of happy robots will take over the universe – is more important than, say, stopping something like the Holocaust. I’m serious. If a longtermist had been alive in 1938 and knew what was going on in Nazi Germany, they would have turned down the opportunity to influence public opinion and policy: “An asteroid might hit Earth someday. The numbers prove we must focus on that.”

Over at Vox, Dylan Matthews’ “I spent a weekend at Google talking with nerds about charity. I came away … worried” captures these problems. Excerpts:

The common response I got to this was, “Yes, sure, but even if there’s a very, very, very small likelihood of us decreasing AI [artificial intelligence] risk, that still trumps global poverty, because infinitesimally increasing the odds that 10^52 people in the future exist saves way more lives than poverty reduction ever could.” 

The problem is that you could use this logic to defend just about anything. Imagine that a wizard showed up and said, “Humans are about to go extinct unless you give me $10 to cast a magical spell.” Even if you only think there’s a, say, 0.00000000000000001 percent chance that he's right, you should still, under this reasoning, give him the $10, because the expected value is that you're saving 10^32 lives. Bostrom calls this scenario “Pascal’s Mugging,” and it’s a huge problem for anyone trying to defend efforts to reduce human risk of extinction to the exclusion of anything else. Ultimately you have to stop being meta ... if you take meta-charity too far, you get a movement that’s really good at expanding itself but not necessarily good at actually helping people.

(By the way, if you don’t buy five more copies of this book for your friends, humanity will go extinct. You’ve been warned.)

Or, as Matt Yglesias put it in What's long-term about “longtermism”?

Suppose right now there’s a 0.001 percent chance that climate change could generate a catastrophic feedback mechanism that leads to human extinction, and doing a Thanos snap and killing half of everyone reduces that to 0.0001 percent. A certain kind of longtermist logic says you should do the snap, which I think most people would find odd.

Furthermore, no one can know what the impact might be of their longtermist efforts. This is called sign-uncertainty, aka cluelessness. We simply don’t and can’t know if our actions aimed at the long-term future might have a positive or negative impact.

There are plenty of examples. One involves work on AI. Think about efforts to reign in / slow down the development of AI in western democracies – e.g., to force researchers to first address the alignment problem. This could lead to an unfettered totalitarian AI from China pre-empting every other attempt. Oops.

Another example: EAs talking about the threat of an engineered virus (ala Margaret Atwood’s fantastic Oryx and Crake) might be what gives real-world Crake his idea to engineer said virus! This is not a fantasy; as Scott Alexander explains, “al-Qaeda started a bioweapons program after reading scaremongering articles in the Western press about how dangerous bioweapons could be.”

Or longtermists could inspire the creation of a malevolent computer system, as noted in this great thread on longtermism.

Alexander Berger, the very handsome and wise co-head of the Open Philanthropy Project, made yet another important point in his 80,000 Hours podcast:

I think it makes you want to just say wow, this is all really complicated and I should bring a lot of uncertainty and modesty to it. ... 

I think the more you keep considering these deeper levels of philosophy, these deeper levels of uncertainty about the nature of the world, the more you just feel like you’re on extremely unstable ground about everything. ... my life could totally turn out to cause great harm to others due to the complicated, chaotic nature of the universe in spite of my best intentions. ... I think it is true that we cannot in any way predict the impacts of our actions. And if you’re a utilitarian, that’s a very odd, scary, complicated thought. … 

I think the EA community probably comes across as wildly overconfident about this stuff a lot of the time, because it’s like we’ve discovered these deep moral truths, then it’s like, “Wow, we have no idea.” I think we are all really very much – including me – na├»ve and ignorant about what impact we will have in the future. 

I’m going to rely on my everyday moral intuition that saving lives is good ... I think it’s maximizable, I think if everybody followed it, it would be good.

And from his interview with The Browser:

I’m not prepared to wait. The ethos of the Global Health and Wellbeing team is a bias to improving the world in concrete actionable ways as opposed to overthinking it or trying so hard to optimize that it becomes an obstacle to action. We feel deep, profound uncertainty about a lot of things, but we have a commitment to not let that prevent us from acting. I think there are a lot of ways in which the world is more chaotic than [we think]. [S]ometimes trying to be clever by one extra step can be worse than just using common sense.

Awesome.

Edit: Hardcore Effective Altruist Kat Woods’ “The most important lesson I learned after ten years in EA”:

To be an EA is to find out, again and again and again, that what you thought was the best thing to do was wrong. You think you know what’s highest impact and you’re almost certainly seriously mistaken.

And when people think they have the answer, and it just happens to be their math, sometimes sarcasm works best:

Backstory: EAs determine an issue’s worthiness based on three variables: 1. Scale, 2. Neglectedness, 3. Tractability. (A calculation like this is what led to One Step for Animals.) Taking this literally leads to D0TheMath’s post on the EA’s Forum, “Every moment of an electron’s existence is suffering.” Excerpts:

Scale: If we think there is only a 1% chance of panpsychism being true (the lowest possible estimate on prediction websites such as Metaculus, so highly conservative), then this still amounts to at least 10^78 electrons impacted in expectation. 

Neglectedness: Basically nobody thinks about electrons, except chemists, physicists, and computer engineers. And they only think about what electrons can do for them, not what they can do for the electrons. This amounts to a moral travesty far larger than factory farms. 

Tractability: It is tremendously easy to affect electrons, as shown by recent advances in computer technology, based solely on the manipulation of electrons inside wires. 

This means every moment of an electron’s existence is pain, and multiplying out this pain by an expected 10^78 produces astronomical levels of expected suffering.

This is funny, but it is very close to how some EAs think! (More funny.) (And some people really do believe in panpsychism. Not funny.) I knew one EA who stopped donating to animal issues to support Christian missionaries. There may be only a small chance they are right about god, but if they are, the payoff for every saved soul is literally infinite! He actually put money on Pascal’s Wager!

I don't know that I’m right; as I mentioned, I’ve changed my mind before. I understand that many smart people think I’m entirely mistaken. But I would at least like them to regularly and overtly admit the opportunity costs, e.g. that writing an endless series of million-word essays about a million years in the future means you are actively choosing not to help the millions who are suffering right now.

You might wonder why I continue to flog this issue. (I blog about it regularly.) It is because I am continually saddened that, in a world filled with so much acute and unnecessary misery, so many brilliant people dedicate their 80,000 hour career trying to one-up each other’s expected value.

PS: The day after I finished this chapter, an essay by Open Philanthropy’s Holden Karnofsky landed in my inbox: “AI Could Defeat All Of Us Combined.”

My first reaction was: “Good.”

He is worried about the previously-mentioned “alignment problem” – i.e., that the artificial intelligence(s) we create might not share our values.

Holden writes:

By “defeat,” I don't mean “subtly manipulate us” or “make us less informed” or something like that – I mean a literal “defeat” in the sense that we could all be killed, enslaved or forcibly contained.

Please note that we humans enslave, forcibly contain, and kill billions of fellow sentient beings every year. So if we solved the alignment problem and a “superior” AI actually were to share human values, it seems like they would kill, enslave, and forcibly contain us.

Holden, like almost every other EA and longtermist, simply assumes that humanity shouldn’t be “defeated.” Rarely does anyone note that it is possible, even likely, that on net, things would be much better if AIs did replace us.

The closest Holden comes is when he addresses objections:

Isn’t it fine or maybe good if AIs defeat us? They have rights too.

  • Maybe AIs should have rights; if so, it would be nice if we could reach some “compromise” way of coexisting that respects those rights.
  • But if they’re able to defeat us entirely, that isn’t what I’d plan on getting – instead I’d expect (by default) a world run entirely according to whatever goals AIs happen to have.
  • These goals might have essentially nothing to do with anything humans value, and could be actively counter to it – e.g., placing zero value on beauty and having zero attempts to prevent or avoid suffering).

Zero attempts to prevent suffering? Hey Holden, aren’t you mistaking AIs for humans? Humans are the cause of most of the world’s unnecessary suffering, both to humans and other animals.

Setting aside our inherent tribal loyalties to humanity and our bias for continued existence, it is likely that AIs defeating humanity would be a huge improvement.

Please convince me otherwise. My life would be better if you did - I'd rather be optimistic. 

No comments: