Opinion | The Surprising Thing A.I. Engineers Will Tell You if You Let Them

This is too important to leave to Microsoft, Google and Facebook.
Opinion | The Surprising Thing A.I. Engineers Will Tell You if You Let Them

The European Commission described this approach as “future-proof,” which proved to be predictably arrogant, as new A.I. systems have already thrown the bill’s clean definitions into chaos. Focusing on use cases is fine for narrow systems designed for a specific use, but it’s a category error when it’s applied to generalized systems. Models like GPT-4 don’t do any one thing except predict the next word in a sequence. You can use them to write code, pass the bar exam, draw up contracts, create political campaigns, plot market strategy and power A.I. companions or sexbots. In trying to regulate systems by use case, the Artificial Intelligence Act ends up saying very little about how to regulate the underlying model that’s powering all these use cases.

Unintended consequences abound. The A.I.A. mandates, for example, that in high-risk cases, “training, validation and testing data sets shall be relevant, representative, free of errors and complete.” But what the large language models are showing is that the most powerful systems are those trained on the largest data sets. Those sets can’t plausibly be free of error, and it’s not clear what it would mean for them to be “representative.” There’s a strong case to be made for data transparency, but I don’t think Europe intends to deploy weaker, less capable systems across everything from exam grading to infrastructure.

The other problem with the use case approach is that it treats A.I. as a technology that will, itself, respect boundaries. But its disrespect for boundaries is what most worries the people working on these systems. Imagine that “personal assistant” is rated as a low-risk use case and a hypothetical GPT-6 is deployed to power an absolutely fabulous personal assistant. The system gets tuned to be extremely good at interacting with human beings and accomplishing a diverse set of goals in the real world. That’s great until someone asks it to secure a restaurant reservation at the hottest place in town and the system decides that the only way to do it is to cause a disruption that leads a third of that night’s diners to cancel their bookings.

Sounds like sci-fi? Sorry, but this kind of problem is sci-fact. Anyone training these systems has watched them come up with solutions to problems that human beings would never consider, and for good reason. OpenAI, for instance, trained a system to play the boat racing game CoastRunners, and built in positive reinforcement for racking up a high score. It was assumed that would give the system an incentive to finish the race. But the system instead discovered “an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate.” Choosing this strategy meant “repeatedly catching on fire, crashing into other boats, and going the wrong way on the track,” but it also meant the highest scores, so that’s what the model did.

This is an example of “alignment risk,” the danger that what we want the systems to do and what they will actually do could diverge, and perhaps do so violently. Curbing alignment risk requires curbing the systems themselves, not just the ways we permit people to use them.