Problems, messes, and LLMs

This post goes into the bucket of "draft" or "thinking out loud"

I — like many others — am very excited by newly accessible large language models (LLMs) like ChatGPT and GPT-4 from OpenAI. I have been experimenting with these tools, and am rapidly developing a viewpoint that LLMs are about to change how we use software fairly dramatically, and fairly quickly.

Recently I tried connecting OpenAI's GPT-3 model via API to the (using a tool called LLamaIndex) to a large PDF document - specifically a SNAP advocacy guide for Massachusetts residents from the legal services and advocacy org MLRI.

The results were very impressive. Given a general question, the model - using the content from the guide - generally composed a solid answer.

I even gave it a real question from a Massachusetts resident who posted on /r/foodstamps on Reddit and... it answered extremely well.

I then wanted to pop in this model to a simple web app so I could share it with some folks and let them play around. Since the code was already in Python, I figured a small Flask app worked great for this. The problem is I hadn't built a Flask app from scratch in a bit, forgot the syntax and boilerplate.

So... I just asked GPT-4 to write the tiny app for for me and... the code worked. Completely and with minimal modification.

What do I make of all this?

Problems vs. messes #

A tangent first: a lot of how I perceive software getting built is in problem decomposition: given a stated problem, what series of sub-steps or -problems are necessary to tackle that build up to a solution to the problem.

A concept I first read about in Lorin Hochstein's great post "The ambiguity of real work" is the distinction between problems and messes, from Russell Ackoff's paper "The Future of Operational Research is Past":

Managers are not confronted with problems that are independent of each other, but with dynamic situations that consist of complex systems of changing problems that interact with each other. I call such situations messes. Problems are abstractions extracted from messes by analysis; they are to messes as atoms are to tables and chairs. We experience messes, tables, and chairs; not problems and atoms.

Because messes are systems of problems, the sum of the optimal solutions to each component problem taken separately is not an optimal solution to the mess. The behavior of a mess depends more on how the solutions to its parts interact than on how they act independently of each other. But the unit in [operations research] is a problem, not a mess. Managers do not solve problems; they manage messes.

This resonates with me in particular because, as much as I love programming, I've always identified with the bundle of characteristics that gets called product engineering. And a key part of that for me has been comfort with ambiguity.

And I will say, it's not been my experience that everyone likes ambiguity! I know plenty of folks whose passion is solving well-scoped problems where the criteria for optimization are clear. It's certainly frustrating to be told your problem is X, work it, and then find out the problem was in fact Y. But also... the world is messes. To be able to be in a "problem solving" mode, there is a prerequisite of someone having decomposed an ambiguous, uncertain, subjective mess into a technical problem.

And reasoning about messes is very different than reasoning about problems. Managing messes are often less about decomposition into clear axioms ("a new page in a web form entails additional routes and a view") and more about poking and prodding at a system to see how it responds in actuality; or modeling it from first principles; or or or...

LLMs and a shift in the human role #

What does this have to do with LLMs?

I think what I'm realizing is that LLMs increasingly cover the surface area of many problems. Writing the correct syntax for a piece of logic in a given language is a problem. Transforming or extracting from a corpus of text is fundamentally a problem. These are things can be solved and optimized.

I also think that means that more of the human operator role in technology or software becomes dealing with messes.

I don't think this is all that different from prior developments in software. Most who know me in a technology context know I really like the web framework Ruby on Rails.

Why? Because what I most enjoy is spelunking complex domain messes in which problems exist. Once something gets to an unambiguous and clearly scoped problem, I will confess I get a little bored!

Rails enabled me to apply software to messes over and over again, because the problems in the bucket of "how you make a web app" were largely taken care of. My own work was overwhelmingly spent in applying methods to make sure I was building the right thing, not building the thing right (already taken care of!)

I think what I see with LLMs is that same shape of a shift, just at a much more profound level: that more and more of the human work of software and technology will be reasoning about and managing messes.

Take the example above: given a decent piece of documentation on navigating a public benefit program, creating a query interface a user could use is now quite easy. I do not exaggerate when I say that most of the similar interfaces in production now I'm aware of hard code responses to an intent detector. This is likely your experience with a government chatbot if you've ever used one.

As another example, we still have a lot of situations where a human is reviewing a bunch of uploaded PDFs to get information out of them and type that into a screen. We were already close to that being solvable with hard-coded logic on top, but are about to get to a world where we now just need to say "these documents show information about income — extract how much income they prove and what the timeframe was (month, biweekly, weekly, etc.)" That... might well work in a high percentage of cases out of the box as of... this month.

So where does the human attention (a scarce resource!) go to? The messy layers above, around ensuring that answers are safe, relatively confident, having feedback loops, etc etc. Still a lot of work to be done! And that's not even taking into account all of the institutional mess work that entails making use of these technologies in large, old, and complex organizations and markets. An LLM can certainly extract text content much better; it is much worse at, say, navigating a relationship with a regulator, or making in-roads into a multi-sided market.

A conclusion for me: tacit knowledge about domain messes might well soon become the top constraint (and highest ROI human work on deployment of this technology. And when I say tacit knowledge, I don't necessarily mean the formal knowledge of a domain. Instead I mean understanding how domains really work rather than how we like to think they work. (An effective counter-argument here is that I love that kind of domain spelunking so, so much that this whole last argument is so squarely in my own interest that it's worth being deeply skeptical of!)

I don't know where this will go. But I'm spending a lot of time right now doing two things:

Testing LLM performance about problems I know exist under messes in domains I know a bit about
Thinking about ways for LLMs to provide leverage on spelunking complex domains

It's always a bit easy and dangerous in a hype-incentivized domain like tech to say this time is different. But best I can tell — this time sure seems different.

You can subscribe to new posts from me by email (Substack) or RSS feed.