Discover more from pnote
The Lasagna Query
Wine, lasagna and the holy grail of AI assistants
I want to buy a cheap bottle of wine on the way to my brother's house that goes well with lasagna.
This query was the hero use case for Viv, an AI Assistant created by ex Siri founders and announced in 2014.
The query is just a single sentence, easy to grasp for a human. But for the AI assistants of 2014 it represented a tough challenge, at two levels: intent understanding, and execution.
Intent understanding is hard, because the short request packs a lot of concepts and relationships that the machine needs to interpret:
intent to buy a bottle of wine
the requirement that it goes well with lasagna
the requirement that it's low price
the hardest one: the wine must be available “on the way to my brother’s house” → this requires understanding that the user is at point A now, that the “brother’s house” is point B, and that we want to buy the wine somewhere along the way from A to B
Execution is hard too, because of the variety of information needed to service the request: GPS location of the user, address of the brother’s house, directions from point A to point B, locations and inventory of wine stores, wine pairing information.
These bits of information are typically provided by different apps: the address of the brother’s house may be stored in the Contacts app, directions may be provided by Google Maps, etc.
What Viv Labs was going after (and demoing in presentations) was stunning, because it was demonstrating dynamic composition: a virtual assistant capable of orchestrating different services on the fly, gluing them together to handle the request.
[note: I work at Google. This section and everything in this post are my personal opinions]
9 years later, the Lasagna Query remains unsolved: virtual assistants as of 2023 struggle to handle composite queries that require dynamic assembly of 3rd-party services. But the quest to build AI systems capable of just that is on, re-energized by the excitement about large-language models (LLMs), such as those used in ChatGPT and Bard.
LLMs are interesting in this context, because they can help with the hard step of intent understanding, by breaking down the initial user request into simpler steps:
I want to buy a cheap bottle of wine on the way to my brother's house that goes well with lasagna. Can you break this problem down into a simple steps that I could perform on my smartphone using existing apps ?
ChatGPT: Yes, here are some steps that you could perform on your smartphone using existing apps to help you find a cheap bottle of wine that pairs well with lasagna: - Open your preferred search engine app on your smartphone. - Search for "best red wines to pair with lasagna". - Look for articles or blogs that recommend specific wines that are budget-friendly. - Check if there are any apps that can help you locate wine shops or supermarkets on the way to your brother's house, such as Google Maps or Waze. - (.. 5 more steps skipped) - Enjoy the lasagna and wine together with your brother.
The hard part
Breaking down the initial intent into simpler steps is only part of the challenge. The other part: mapping the steps to specific actions that a computer can perform and executing them using 3rd-party integrations (oh, and building and scaling the ecosystem of those 3rd-party integrations) is a hard problem.
Viv itself was acquired by Samsung in 2016 and doesn’t exist as a product today. The vision they pitched remains compelling. With the acceleration of AI development brought by large language models, we may see it come to fruition one day 💫.
In other news
🔑 Passkeys, a new authentication standard, promises “The beginning of the end of passwords”. Interested to see how much adoption it will get. For many websites passwords are already not a thing, because they delegate authentication to 3rd-party identity providers (e.g. sign in with Facebook, sign in with Google)
📝 pnote goes Substack-first ! Previous newsletters were short-form versions of pnote.eu blog posts. With this edition, I’m trying the opposite formula: the content is first published via the newsletter, and later transcribed into a (potentially longer-format) blog post. This way as a newsletter subscriber, you’re the first to see the news :).
Postcard from Amsterdam
Lots of rain in Amsterdam last weekend with occasional glimpse of sunshine and as always, very photogenic views on the channels.
Have a great week 💫,