Rest in pieces: end-to-end vs modular deep learning
🚗 How do you program self-driving cars, end-to-end vs modular deep learning, neural networks are software systems
How do you program self-driving cars?
Deep learning is probably part of the answer, but how do you do it?
End-to-end neural network
One approach is to use one big neural network. We call this type of system “end-to-end” deep learning.
➡️ The input of the system are the video streams from the cameras and the data from other sensors in the car.
⬅️ The output of the system are driving decisions.
Everything in between happens in one big neural network.
Andrej Karpathy, who used to lead the self-driving research team at Tesla, explains how their self-driving stack went from a lot of hand-written C++ logic to a progressively bigger and bigger neural net:
The plan from the start was for the neural net to “eat through the stack”. I believe in this approach. I think that in say 10 years, the end-to-end system at Tesla will just be one neural net. The video is streaming in and commands come out.
(transcript edited for brevity)
Modular deep learning systems
Another approach would be to break the system up into pieces.
For example, one component could analyse sensor data and output abstract representation of the scene. Another component could then make driving decisions based on it.
Both components could be neural networks, the point is that there would be well-defined boundaries between them. Each module could be separately trained and tested.
Here's an interview with Drago Anguelov, head of research at Waymo:
The trend has been larger and larger neural nets. Doing more and more, potentially going from neural nets in narrow scope to neural nets in wider scopes.
There is no clarity if a fully end-to-end learned system is actually better. There are trade-offs between different extremes. Whether the answer is several large modules or a single end-to-end thing, I think it's an open question.
Conclusion
The human brain is the original neural network. When we drive, our brain acts as an end-to-end system. It takes the signals from our senses and is steering the car in response. So, intuitively, it should be possible to train an end-to-end neural network for driving.
That said, neural networks are software systems.
There are powerful reasons for which complex software is split into pieces. Smaller modules are easier to test. When they misbehave, they are easier to debug.
For something as important as driving cars, I’d guess that the advantages of testability and explainability will lead to systems that are modular. Let’s check back on this in 10 years :).
Postcard from Paris
All the drivers on the streets of Paris remain human, for now 🤖.
Stay warm,
– Przemek