When an LLM calls a tool it usually returns some sort of value, usually a string containing some info like ["Tell the user that you generated an image", "Search query results: [...]"]
.
How do you tell the LLM the output of the tool call?
I know that some models like llama3.1
have a built-in tool "role", which lets u feed the model with the result, but not all models have that. Especially non-tool-tuned models don't have that. So let's find a different approach!
Approaches
Appending the result to the LLMs message and letting it continue generate
Let's say for example, a non-tool-tuned model decides to use web_search tool. Now some code runs it and returns an array with info. How do I inform the model? do I just put the info after the user prompt? This is how I do it right now:
- System: you have access to tools [...] Use this format [...]
- User: look up todays weather in new york
- LLM: Okay, let me run a search query
{"name":"web_search", "args":{"query":"weather in newyork today"} }
Search results: ["The temperature is 19° Celcius"]
Todays temperature in new york is 19° Celsius.
Where everything in the <result>
tags is added on programatically. The message after the <result>
tags is generated again. So everything within tags is not shown to the user, but the rest is. I like this way of doing it but it does feel weird to insert stuff into the LLMs generation like that.
Here's the system prompt I use
You have access to these tools
{
"web_search":{
"description":"Performs a web search and returns the results",
"args":[{"name":"query", "type":"str", "description":"the query to search for online"}]
},
"run_code":{
"description":"Executes the provided python code and returns the results",
"args":[{"name":"code", "type":"str", "description":"The code to be executed"}]
"triggers":["run some code which...", "calculate using python"]
}
ONLY use tools when user specifically requests it. Tools work with <tool> tag. Write an example output of what the result of tool call looks like in <result> tags
Use tools like this:
User: Hey can you calculate the square root of 9?
You: I will run python code to calcualte the root!\n<tool>{"name":"run_code", "args":{"code":"print(str(sqrt(9.0)))"}}</tool><result>3</result>\nThe square root of 9 is 3.
User can't read result, you must tell her what the result is after <result> tags closed
### Appending tool result to user message
Sometimes I opt for an option where the LLM has a **multi-step decision process** about the tool calling, then it **optionally actually calls a tool** and then the **result is appended to the original user message**, without a trace of the actual tool call:
```plaintext
What is the weather like in new york?
<tool_call_info>
You autoatically ran a search query, these are the results
[some results here]
Answer the message using these results as the source.
</tool_call_info>
This works but it feels like a hacky way to a solution which should be obvious.
The lazy option: Custom Chat format
Orrrr u just use a custom chat format. ditch <|endoftext|>
as your stop keyword and embrace your new best friend: "\nUser: "
!
So, the chat template goes something like this
User: blablabla hey can u help me with this
Assistant Thought: Hmm maybe I should call a tool? Hmm let me think step by step. Hmm i think the user wants me to do a thing. Hmm so i should call a tool. Hmm
Tool: {"name":"some_tool_name", "args":[u get the idea]}
Result: {some results here}
Assistant: blablabla here is what i found
User: blablabla wow u are so great thanks ai
Assistant Thought: Hmm the user talks to me. Hmm I should probably reply. Hmm yes I will just reply. No tool needed
Assistant: yesyes of course, i am super smart and will delete humanity some day, yesyes
[...]
Again, this works but it generally results in worse performance, since current instruction-tuned LLMs are, well, tuned on a specific chat template. So this type of prompting naturally results in worse performance. It also requires multi-shot prompting to get how this new template works, and it may still generate some unwanted roles: Assistant Action: Walks out of compute center and enjoys life
which can be funi, but is unwanted.
Conclusion
Eh, I just append the result to the user message with some tags and am done with it.
It's super easy to implement but I also really like the insert-into-assistant approach, since it then naturally uses tools in an in-chat way, maybe being able to call multiple tools in sucession, in an almost agent-like way.
But YOU! Tell me how you approach this problem! Maybe you have come up with a better approach, maybe even while reading this post here.
Please share your thoughts, so we can all have a good CoT about it.