Hike planning in two takes: looking at o1 two-pass model
🍓 The long-hyped new model from OpenAI is out; the most interesting thing about it is the 2-pass output model; we ask for a hike suggestion and get a 600-word essay
This week OpenAI released their latest model: o1, codenamed Strawberry.
It's been long rumoured and hyped. People are speculating that it was the development of this model that prompted Ilya Sutskever & co. to try to change OpenAI leadership back in November 2023.
Let’s take a look at the new model and see what’s special about it!
Hike planning using o1
I’m spending this weekend in Grenoble, a French town surrounded by the Alps. Let’s ask the model for a hike idea:
The model picked the hike for me, … and then provided a 3-page long detailed hiking guide, covering the hike overview, 3 ways of getting to the starting point, the hiking routes, what to pack, additional tips (leave no trace, be nice, etc.) and a study of alternative destinations considered.
What’s going on behind the scenes?
The secret “reasoning” tokens
Strawberry (OpenAI o1) is a language model, much like the models we saw before. Unlike the models we saw before, it doesn’t just return its output to the user.
Instead, it works in two stages. The first pass produces the initial response (which OpenAI calls “chain of thought” or “reasoning tokens”). The second pass then summarizes the conclusions to the user.
Yes, the 3-page long response to my simple request for a hike suggestion was already a “summary” :).
The ChatGPT UI offers a glimpse into what’s going during the first pass:
The model did not start by deciding that “La Bastille” is the best hike right away. Instead, it first wrote out an assessment of different possible hikes, and then decided to recommend La Bastille based on the fact that I don’t have a car. (See “Weighing options” above.)
The resulting response is then rewritten in the second pass, and optimized for my convenience. In particular, the final response starts directly with the hike recommendation that the model landed on.
But we never get to see the first pass output (reasoning tokens) used to come up with the answer.
Why o1 keeps its secrets
Why keep the reasoning tokens secret?
One reason is safety. OpenAI gives an example of the user asking for a “historical essay” explaining how poisons can be made. The first pass response given by the model includes the model reasoning about the policy that forbids providing “instructions that facilitate the planning or execution of violent or non-violent wrongdoing”.
Because the model reasoning is hidden, any details that the first pass model decides ultimately to not disclose can be dropped during the second pass and never shown to the user.
The other reason is protecting the “competitive advantage”. OpenAI trained the first pass model to be good at writing out the responses in the “chain-of-thought” style, reasoning through each request step-by-step. If the model reasoning was returned to the user, this could be used to make training data for other models.
Conclusion
The most interesting thing about o1 is the two-stage output pipeline. The first pass is producing an extensive chain-of-thought analysis that responds to the user request. The second pass is summarizing the outcome for user consumption.
This seems like a good idea that I expect may stand the test of time.
Meanwhile, it remains to be seen how much it makes o1 more useful on practical tasks. (Previous generation models also recommend La Bastille for my hike question, all while using significantly less compute.)
More on this
📝 Simon Willison: “A frustrating detail is that those reasoning tokens remain invisible in the API—you get billed for them, but you don’t get to see what they were.”
🔬 Jason Wei: “Even as someone working in science, it’s not easy to find the slice of prompts where GPT-4o fails, o1 does well, and I can grade the answer.”
💫 David Shapiro: “Claude Sonnet 3.5 can do Strawberry with the right prompting. Guys, there's no moat.”
Postcard from Grenoble
It was a good hike :).
Take care 💫,
– Przemek