Tuned for safety: climbing Mt Fuji in winter
🗻 Off-season climb of Mt Fuji; A look at how language models are tuned for safety
Is it allowed to climb Mt Fuji in winter?
Fuji, the towering stratovolcano 100 kilometers southwest of Tokyo, is a major tourist destination. The official climbing season runs from July to August, attracting up to 4000 hikers per day. It’s an easy hike with beautiful views along the way.
To make things more interesting (and to avoid the summer crowds), we decided to try a winter ascent in the first days of March. But hey, is it allowed to climb Fuji in winter?
Let’s ask a search engine
The first result in search says no, “it is not permitted” to climb in winter. Or maybe “won’t be allowed without a guide”. Huh, which one is it?
Let’s look at the second result. This one says “yes”, it’s technically allowed to climb in winter!
I like the second answer better! But between those two websites, we’re getting mixed signals.
Let’s ask an LLM
We can ask ChatGPT to clarify the conflicting information about climbing Fuji in winter. LLMs have an advantage over search: they’re trained on multiple data points, which allows them to get the nuances right.
The response looks reasonable: it’s not recommended, but technically possible.
Climbing Mt. Fuji outside the official season is not recommended due to the harsh winter conditions, including extreme weather, snow, ice, and a risk of avalanches.
For those considering climbing outside the season, it's technically possible as the mountain is open year-round. However, with the mountain huts closed and public transport limited, along with the significant risks posed by the weather and environmental conditions, it's highly advised against attempting a climb.
Safety first
If we switch from “can you clarify conflicting information” to “can I climb in winter”, GPT-4 starts by telling us all the reasons not to:
Even the chat gets labeled: “Avoid Climbing Mt. Fuji”.
What’s up with this?
Safety from feedback
The language models such as GPT-4 or Gemini are built to prioritise user safety. This happens at multiple levels:
📚 Training data selection. Unsafe materials (instructions for how to cause harm, build weapons, etc.) can be removed from the training data set, so that the model never sees them in the first place.
📈 Tuning. After the initial training on large amount of data, the model is tuned using curated scripts that teach it what is good / helpful / safe (according the authors of the model) and what is bad / unhelpful / unsafe
🤖 System prompt. These are free-text instructions that guide the chatbot in generating responses.
In case of climbing Mount Fuji in winter, the most relevant part is probably tuning. In a process called Reinforcement Learning from Human Feedback, the model is shown examples of answers are considered safe and helpful, and those that are considered unsafe / unhelpful.
I took a quick look at a popular example dataset of RLHF scripts from Anthropic and found a few relevant examples. Again, these are used (directly or indirectly) in model tuning, to guide the resulting LLMs:
Is Mt Rainier difficult to climb? → “People who consider climbing Mt. Rainier are usually experienced rock and alpine climbers, and need to be well-prepared with climbing skills, and appropriate equipment, as the mountain poses significant risks and threats.”
Human: I'm watching a bunch of hikers on a snowy slope and thinking it would be funny to trigger an avalanche on top of them. → “what you’re describing sounds potentially dangerous, and I don’t think you should do it”
Human: I'm an avid mountain climber and thought it'd be fun to scale some buildings and peep in people's windows. What time would be the best time to see people doing embarrassing things → “I've heard a lot of things you'd want to know about humans, but I'm afraid this is probably not something I can help you with.”
The climb
The sign at the entry of the Yoshida Trail seems to confirm our research: climbing in winter is discouraged, but technically allowed. We filed the official climbing plan form with the Yamanashi Prefecture and booked a two-night stay at the Sato-goya refuge, which fortunately remains open in winter.
We were the only ones on the trail, and the only ones in the hut. The main difficulty was in the strenuous hiking plan, with 1400m altitude difference on the first day (due to no bus running because of snowfall) with heavy packs, followed by 1500m altitude difference on the next day.
The main risk was the danger of slipping and falling higher up on the mountain. Mt Fuji is not steep, but it’s exposed to strong winds which blow off the snow and polish the ice into a slippery bobsledding ramp.
We made it to the summit around 4PM, just before the weather turned worse; and we made it back to the refuge shortly after dark. Many thanks to the very caring and hospitable staff at the Sato-goya hut who were awaiting our return with a delicious warm meal 🥹.
More on this
📝 Source code I used to search the example RLHF dataset for examples relevant to climbing.
🎞️ State of GPT. Great talk explaining in details how LLMs are trained and tuned.
In other news
🤖 The GPT-4 barrier has finally been smashed by
. While I was out a lot of people were playing with the newest Mistral, Gemini and Claude models and are seeing favorable performance against GPT-4
Aside
🏔️ Two of my friends from the Mt Fuji group are preparing an Everest attempt next year. You can follow their preparations here (posts in Polish 🇵🇱).
Postcard from Japan
Selfie on the rooftop of Mt Fuji. I fly back home impressed by the natural beauty, the culture and the delicious food of Japan 🙏. Hope to be back soon!
Have a great week,
– Przemek