Agent engineering as UI engineering
Like many others, I’m seeing that there really is a lot of magical thinking around LLM agents. There’s this belief that autonomous agents soon will do all the work users want to accomplish through today’s more manual user journey… except that LLM agents will be cheaper, faster, better. All this may be true, but the real magical thinking is the belief that this can be done simply by connecting LLMs to good data.
I’m no expert in building LLM agents. After casually following the related topics for a few months, I’ve only really got my hands dirty during a hackathon at work a little while ago. Still, if I’m getting it right, the beauty of the current generation’s LLM agent lies in its conceptual simplicity: Give a LLM a task and a set of tools, then let it figure out how to complete the task.
Because an LLM agent is so simple in concept, making it useful takes real engineering. LLMs are so smart yet so stupid, and they need a lot of help to be able to handle real work. Some of that help is about giving them the right context and the right tools (not easy, by the way!). Plus, since LLMs are fundamentally unreliable, we need excellent observability, fallback strategies, guardrails, and so on to maintain a degree of sanity in our system.
The same people who ask “how hard is it to add a button to do X?” now ask “how hard is it to build an agent that can do X… and Y and Z?” We need to tell them LLM agents require more engineering, not less—not to just say no to new feature requests, but to make visible the practical possibilities and limitations of leveraging agents for tasks at hand.
* * *
In “UI concerns are verticals”, I wrote:
A product’s UI defines how its users interact with the system and how the system reaches the users. UI makes it possible for users to act on the system and for the system to provide feedback to users. […] UI dictates how users experience the product and how the product comes in contact with its users.
I’m starting to see that agent engineering greatly overlaps with UI engineering. There are two layers to this: 1) agent as user to our system and 2) agent as the interface to our system.
On one hand, LLM agents are our users. In fact, agents are no less unpredictable and inconsistent than humans. No matter how much we want it, our agents won’t always follow the happy paths we build for them—just like humans! So we have to carefully design what they can access, what “buttons” they can “click,” and what feedback they receive upon taking each action, valid or not. We need to monitor what they do and how they fail, so we can improve our “UI” and guide them to complete tasks more reliably.
On the other hand, LLM agents do not replace our users. As long as they act on our users’ behalf, agents are ultimately just another UI layer, and their output to user input must be clearly communicated. Of course, LLMs being what they are, we need a way to ensure that their output is sufficiently reliable and transparent so the end users (the “real” users) can trust, verify, and act on it with confidence. The more autonomous our agents are and the greater impact they can have, the more critical this feedback becomes.
* * *
One major piece of UI for LLM agents is, of course, the set of tools we expose to them. In most cases, we wouldn’t give our human users free, full access to our database: the same goes for our agents. Instead, we would like to provide them with a set of structured paths, an interface, to access and process data. For this, wrapping existing API endpoints might be enough; or maybe new specific tools are needed for LLM to leverage. Either way, the point is that we need to design this interface instead of letting LLMs go wild. Constraints are the key.
When it comes to agents as a UI layer, I find it helpful to think in terms of collapsing multiple user interactions into one. Instead of filling out five input fields and clicking three buttons, a user can fill out a single field or click a single button to prompt the agent to act or accept the outcome of that action. From this standpoint, UI tailored for performing individual tasks can be better suited when a clear set of possible actions exists, despite chat being the most common UI pattern for exposing agents to end users.
In many ways, engineering for agents is engineering for UI, and at the heart of it lies putting the right constraints in the right places for users—humans and LLMs alike—at each point in the journey through our product. We constrain users so they can focus on tasks. We constrain agents both in what’s exposed to it and how it’s exposed to the end users so that, again, we maintain our focus on tasks.