Lets say you’re writing a Slackbot that plugs into an LLM. When you send the LLM the prompt, instead of only sending the user message to the LLM, you could send all the Slack data associated with that message including the message itself. Then you could give the LLM tool calling access to the Slack API to perform a range of lookups using data from the Slack request, including for example the Slack user ID of the message sender. To continue this example, the LLM might use the API to look up the username and full name of the message sender, so it can have a more natural conversation with that person.
For the moment, forget about the specifics of implementing a Slack user interface. The question here is, do we give the model all the data and let it call tools to do things with that data when it determines that would be helpful in building a response? Or do we nanny the model a bit more (provide more care and supervision) and only give it a prompt that we’ve crafted, without ALL the available metadata?
I’m going to call this The Nanny Scale and suggest that as models continue to get smarter we’ll move more towards increasing model responsibility. It also varies based on how smart the model is you’re using. If it’s an o1 Pro level model with CoT and tool calling capability, maybe you want to give it all the metadata and as many tools as you can related to that metadata, and just let it iterate with the tools and the data until it decides it’s done and has a response for you.
If you’re using a small model and then further quantizing it to fit into available memory, thereby risking reducing it’s IQ even further, you probably want to nanny the model, meaning that you increase the care and supervision and reduce the responsibility the model has, and you pre-parse data, removing anything that can cause confusion but may be potentially useful, and reduce the available tools, if you provide any at all.
It’s clear that a kind of Moore’s Law is emerging with regards to model IQ and tool calling capabilities. Eventually we’re going to have very smart models that are very cheap and that can handle having an entire API thrown at them in terms of available tools. But we’re not there yet. Models are expensive, so we like to use cheaper less capable models when we can, and even the top performers aren’t quite ready for 100% responsibility.
So as we’re building applications we’re going to have to keep this in mind. We’ll launch v1, models will evolve over several months, and for v2 we’re probably going to have to slide the nanny scale down a notch or two or risk shielding our customer from useful cognitive capabilities that are revealed when a model takes on more responsibility.