Azhdarchid

Imagining the Car but not the Traffic Jam

A 1950s illustration of a flying car

So I promise I'm not trying to single this individual out – this type of arguments is really common, and this is just one particular example that struck me as pretty illustrative. Here's a post from someone who had a moderate run of pro-AI posts on Bluesky recently.

@opinionhaver.bsky.social‬: "I think there’s a ton of ways in which ‘guy who lives in computer’ can increase human flourishing provided the guy becomes reliable enough, sane enough, and cheap enough. Eg: helping people navigate benefit bureaucracies, with an infinite supply of (limited in scope) case workers for simple cases."

Original post. He goes on:

I just got laid off. I have no idea how to navigate the state UI system. I feed the system a couple of documents, answer a few questions about my situation , my application is processed far quicker and approved.

This is a tailor-made rosy example. What if we could replace all that nasty government red tape with a faster automated system? Doesn't that improve people's lives?

But let's actually think through how this would work out in practice. First, and most obviously, the application of LLMs to this type of task has had largely negative effects so far. Right now the internet is crawling with horror stories about LLMs being used to do things like legal work and generating inaccurate or nonsensical legal filings, for example.

These types of examples always assume a vast increase in LLM reliability as a given, which almost always goes with a gross underestimation of how reliable humans are that do this type of work. Consider airline pilots: Last year there were over 10 million commercial flights in the US, and there was only one major accident. The accuracy of airline pilots in avoiding accidents is on the order of 99.9999999%.

Obviously this is an extreme example, but ask yourself: how many patients does the typical nurse treat per year, versus the rate of serious medical errors by nurses? How often do lawyers cite something incorrectly in a filing? In most fields where the consequences for error are serious, human performance is not 99% accuracy or 99.9% accuracy; it's 99.99% accuracy or better.

Humans make mistakes, but human mistakes are (in certain fields) exceptionally rare relative to our everyday sense of human fallibility, which should make us skeptical that LLMs can climb to the level of safety and reliability that we expect in these fields.

So there's already a strong assumption in thinking that LLMs can get not only better, but orders of magnitude more accurate. LLMs fundamentally do not have the kinds of introspective abilities that give humans safeguards against error, like the ability to admit low certainty about a decision, seek a second opinion, or even say "I don't know" reliably. They can perform a facsimile of introspection (which consists of generating text "role-playing" at introspection), but this is pretty obviously not actually introspective thought – an LLM saying it's unsure about something does not actually derive from any real ground truth about how likely things are.

Of course there is the obvious risk that the allure of "faster and cheaper" will push organizations to accept higher risk, replacing human labor with less consistent chatbot systems that create error. This is one of the more obvious harms in LLM deployment.

But beyond that, there's a failure to even consider what the effect of these systems is when deployed universally. These arguments feel very much like saying "Imagine if anyone could buy a car and use it to get to work on time." If you actually imagine it, what you see is – of course – the traffic jam. As Asimov put it in 1953:

It is easy to predict an automobile in 1880; it is very hard to predict a traffic problem.

One obvious question is: on whose side is the agent, the person trying to access a benefit, or the agency trying to evaluate claims?

If it's the agency, suddenly you're replacing civil servants with an automated system that can be arbitrarily adjusted at the whims of political appointees. Even if the system is designed generously and cautiously, you're still replacing an accountable human decision with an automated system. Is it worth it if it's faster for the average user, but some percentage of people end up having to laboriously appeal a bad LLM decision? Again: thinking this works requires making huge assumptions about improvements in accuracy.

If the civil servant workforce doing this work is largely replaced by this automated system, does human review (when requested) become prohibitively slow because hardly anyone still understands the system? If the rules or requirements change, are we sure the system would not become less reliable?

On the other hand, if people making claims have agents but those claims go into human review, do agencies then become swamped with low-effort spurious claims?

Right now every open service on the internet is attack surface for loosely-directed agents noodling around. What happens when large numbers of people spin up an LLM agent and tell it to "try to get me as many government benefits as you can"?

Does some kind of rate limit or "honest signal" need to be implemented, which ends up adding friction or cost to everybody? Do standards for approving an application rise, essentially further disadvantaging anyone who can't access agents to apply on their behalf?

Most likely, both sides have an agent – the claimant asks their agent to file a claim, which is then read by another agent. Decisions on whether to approve the claim or not are made somewhere in the middle, with no real accountability. If you think an error was made, how do you escalate? Is it just agents all the way up?

Again, if humans are no longer reviewing these applications, where are you going to find a human with authoritative experience in that bureaucratic system to deal with it when things break?

What happens to one of these systems when every aspect of human discretion gets removed? What happens when individual decisions can never be attributed to any person?

This sort of shallow tech solutionism always comes down to taking a political problem (welfare benefits are distributed in unequal ways by systems designed to deny people things they're entitled to) and trying to slot in a solution that may work at the individual level, but is simply conceived in ignorance of the social context.

For every action, there is an equal and opposite reaction. For every agent, there is a counter-agent. Any system that becomes mediated by chatbots that are adversarial to the end user (in our example, an LLM that's trying to filter out invalid claims) is naturally going to be attacked with user-side agents. After all, your personal agent already has access to all your information and documents – a privacy and security nightmare, but one that a lot of people are willingly jumping into – so it's much better equipped to file a claim on your behalf than the bureau of labor's own agent.

Behind the anodyne fantasy of "what if applying for unemployment was easier" is a dystopian reality of a world where everyone is living and dying by the decisions of a pack of adversarial, semi-autonomous systems driven to flatter and please their users and attack everyone else.

What if instead of talking to other people, we expressed our preferences to a crude but effective autonomous system that's capable of misinformation, reputational attacks, and harassment? What if every transactional interaction – buying things, applying for government benefits, getting customer service, scheduling dentist appointments – was mediated through an adversarial pairing of two LLMs trying to trick one another into serving the interests of their respective masters? What if everyone was empowered to be malicious, at all times, to everyone, without experiencing shame or suffering the direct consequences of malicious behavior? What if people largely didn't even perceive it as maliciousness, but just as a normal part of zero-sum life?

If you think this is some far-fetched pessimist view of human nature, look at how LLMs are already being deployed in customer service essentially as a malicious system meant to remove agency from customers. Once you no longer have to employ real customer service reps who are subject to shame and capable of empathy, some basic limiting factors on how much you can screw your customers get taken out. Is what I'm describing not just the natural consequence of all the customers also being armed with these systems?

Even if all the assumptions about the reliability and quality of the output of LLM agents hold true, what makes us think that a society where these things are the primary medium of communication between people and organizations isn't a socially unraveling nightmare?

Do we really think that this society would be more productive than our current society to the same extent that industrial society is more productive than an agrarian society?

Because this is what the "success" case of mass LLM deployment looks like to me: a world where everyone is constantly burning tokens like they're going out of style in service of a red queen's race of absolute malice. It is entirely plausible to me that all of the economic benefit of LLM agents is fed directly into the maw of novel economic costs created by LLM agents.

This is not without precedents, after all; right now we dump a significant amount of resources, as a civilization, into either creating spam or fighting spam – two economic activities that largely cancel out, producing the barest sliver of value for spammers. Just because a technology appears to "do work" – though it creates the appearance of work more than actual work – doesn't mean that this work is going anywhere useful.

What the people selling agents are driving us toward is a world where everyone has become a spammer, where all of our preferences and needs are constantly being broadcast out to the world by autonomous systems devoid of ethics or discretion. After all, if you're not using those systems, you're falling behind.

#LLMs #so-called AI