Magical, but not magic: peeking inside a neural network
đĄ Amy, Bob and Clara are neurons; and each of them has a different opinion about the Titanic disaster
Neural networks are the wizardry of the modern age đ«. They can do magical things (ChatGPT, Midjourney, Google Translate, âŠ) and no-one seems quite sure about how they work.
I mean, we sort-of understand how they work: they are made of âdigital neuronsâ, they are trained on large amounts of data. But how do they really work? What sound do they make when you gently squeeze them?
The explanations I could find on the Internet are either very vague (âitâs just like neurons in a brain, but digitalâ) or waaay more complex than necessary: if you Google âThe simplest neural networkâ, the first result talks about sigmoids đ.
So for today I put together what may be the worldâs simplest, distilled-to-the-basics, explainable neural network. It features three characters: Amy, Bob and Clara. They are digital neurons, and each of them has a different opinion about what happened on the Titanic đ«.
Titanic passengers
Weâre going to make a neural network that predicts the survival chances of Titanic passengers. (A topic we saw before.)
To keep the example really simple, weâre going to look at just two pieces of information about each passenger: their age and sex. Each will be a numeric value:
age: a number between 0.0 and 1.0:
1.0 represents the oldest age found among the Titanic passengers. Everyoneâs age has been proportionally scaled to fit the range from 0.0 to 1.0
sex: 0.0 for male and 1.0 for female. (Gender is not binary but the relevant data in the Titanic data set is.)
The panel of experts
The neural network will have three neurons: Amy, Bob and Clara.
You can think of them as a panel of experts we invite to predict survival outcomes of each passenger. For each expert, weâre asking them to provide a number close to 1.0 for passengers that are likely to survive, and 0.0 for passengers that are likely to perish.
Amy thinks that women are more likely to survive, regardless of their age. So she makes her predictions using the formula:
Amyâs prediction: (0*age) + (1*sex)
This boils down to predicting 1.0 (survival) if the passenger is a woman, and 0.0 (demise) otherwise.
Bob is a hopeless optimist, he predicts that everyone would survive regardless of the data. Just like the experts we see in the media, our experts are not necessarily very good :).
Bobâs prediction: 0*age + 0*sex + 1
Clara thinks that children and women are more likely to survive and she uses both pieces of data in her predictions, giving them equal weight:
Claraâs prediction: -0.5*age + 0.5*sex + 0.5
Prediction by committee
For the first passenger, Amy thinks he will perish, while Bob predicts survival. Now weâre in a classic real-life situation: we have multiple experts and they donât agree đ€·. We can come up with an estimation of how much we trust each expert, and combine their opinions into a weighted average:
combined prediction = 1/3*amy + 1/3*bob + 1/3*clara
Values >= 0.5 indicate a prediction of survival, less than that indicates a prediction of demise. We note the resulting prediction in the âOutcomeâ column:
Neural network
What we just made is, in fact, a neural network. Amy, Bob and Clara are three digital neurons. Theyâre connected to the input data, apply a transformation on it and produce a value. At the end we combine their outputs into a single value and use it to make predictions:
Yes, this example is very simple and silly đ€Ą. But now that we see how it works, we can better explain the more interesting aspects of neural networks.
How to train a dragon
In our example, we completely made up the formulas that Amy, Bob and Clara use to make predictions, and also the final formula that combines their predictions.
Hereâs the first fun fact about neural networks: it actually works like this in real life đ«. Training the neural network starts with random values assigned to each value (like the numbers â0â and â1â in Amyâs formula). Then, the network is âtrainedâ on data to find better values.
What does it mean to âtrainâ the neural network? Unlike the experts on the media, Amy, Bob and Clara are happy to change their opinions :). To train the neural network, we calculate the outcome for data where we already know the right answer, and then tweak the parameters (ie. the formulas that Amy, Bob and Clara use) to better match the expected outcome.
When the training goes well, the quality of the predictions improves with time:
In this extremely simple example, I trained Amy, Bob and Clara on 700 passenger data in the Titanic data set. The neural network quickly learns to just predict âDoesnât surviveâ for almost all passengers, which allows it to reach accuracy of about 60% (most passengers indeed didnât survive).
From three neurons to ChatGPT
How do you get from a tiny network like this to something that can power ChatGPT?
For one thing, youâre gonna need a bigger network. The size of a neural network is measured in the number of parameters. Remember the formulas each neuron used to predict survival outcomes?
Claraâs prediction: -0.5*age + 0.5*sex + 0.5
Each of those numeric values (-0.5, 0.5 and 0.5) is a parameter. In total, our example network has 12 of them, 3 per each neuron and then 3 for the final output formula.
For comparison, a âsmallâ 2023 state-of-the-art open source model by Mistal AI has 7 billion parameters (583 million times more :)).
The structure of the network matters too. The famous âtransformersâ architecture that powers LLMs like the one behind ChatGPT looks like this when visualized. The matrices represent data flowing through the artificial neurons.
The final difference is a bit of fancy math we skipped here. It makes it possible for the neural network to model the complexities of the problem weâre solving (activation functions) and for us to find the right way to tweak the parameters (gradient descent). But the main point of this post is this: you donât need to understand any of this to get the basic mechanism of what neural networks are.
It really is just Amy, Bob and Clara and us fiddling with the parameters of their formulas to best predict Titanic survival outcomes đ«.
In other news
đïž A 5-minute video showing a bit more realistic application of a bit bigger network (with more layers than the 1 layer of âneuron expertsâ in our example).
đ€ Google DeepMind announced Gemini, a family of multimodal models with state-of-the-art performance. The stunning demo video generated some follow up discussions. Gary Marcus comments: Friends donât let friends take demos seriously
đČ Meta released Cicero, AI for playing Diplomacy the board game. I miss playing board games.
Postcard from Paris
Days are short and rainy in Paris. The conditions for reading and writing and staring into the void remain perfect. I think Iâll start a poi spinning group at work to boost the mood đ€č.
Have a great week đ«,
â Przemek