I want to buy a cheap bottle of wine on the way to my brother's house that goes well with lasagna.
This query was the hero use case for Viv, an AI Assistant created by ex Siri founders and announced in 2014.
The query
The query is just a single sentence, easy to grasp for a human. But for the AI assistants of 2014 it represented a tough challenge, at two levels: intent understanding, and execution.
Intent understanding is hard, because the short request packs a lot of concepts and relationships that the machine needs to interpret:
intent to buy a bottle of wine
the requirement that it goes well with lasagna
the requirement that it's low price
the hardest one: the wine must be available āon the way to my brotherās houseā ā this requires understanding that the user is at point A now, that the ābrotherās houseā is point B, and that we want to buy the wine somewhere along the way from A to B

Execution is hard too, because of the variety of information needed to service the request: GPS location of the user, address of the brotherās house, directions from point A to point B, locations and inventory of wine stores, wine pairing information.
These bits of information are typically provided by different apps: the address of the brotherās house may be stored in the Contacts app, directions may be provided by Google Maps, etc.
What Viv Labs was going after (and demoing in presentations) was stunning, because it was demonstrating dynamic composition: a virtual assistant capable of orchestrating different services on the fly, gluing them together to handle the request.

Enter LLMs
[note: I work at Google. This section and everything in this post are my personal opinions]
9 years later, the Lasagna Query remains unsolved: virtual assistants as of 2023 struggle to handle composite queries that require dynamic assembly of 3rd-party services. But the quest to build AI systems capable of just that is on, re-energized by the excitement about large-language models (LLMs), such as those used in ChatGPT and Bard.
LLMs are interesting in this context, because they can help with the hard step of intent understanding, by breaking down the initial user request into simpler steps:
I want to buy a cheap bottle of wine on the way to my brother's house that goes well with lasagna. Can you break this problem down into a simple steps that I could perform on my smartphone using existing apps ?
ChatGPT response:
ChatGPT: Yes, here are some steps that you could perform on your smartphone using existing apps to help you find a cheap bottle of wine that pairs well with lasagna:
- Open your preferred search engine app on your smartphone.
- Search for "best red wines to pair with lasagna".
- Look for articles or blogs that recommend specific wines that are budget-friendly.
- Check if there are any apps that can help you locate wine shops or supermarkets on the way to your brother's house, such as Google Maps or Waze.
- (.. 5 more steps skipped)
- Enjoy the lasagna and wine together with your brother.
The hard part
Breaking down the initial intent into simpler steps is only part of the challenge. The other part: mapping the steps to specific actions that a computer can perform and executing them using 3rd-party integrations (oh, and building and scaling the ecosystem of those 3rd-party integrations) is a hard problem.
Viv itself was acquired by Samsung in 2016 and doesnāt exist as a product today. The vision they pitched remains compelling. With the acceleration of AI development brought by large language models, we may see it come to fruition one day š«.
In other news
š° Last week was Google I/O! The Verge summarizes the top news. Related to this post, the keynote announced integration of Google and 3rd-party tools into Bard!
š Passkeys, a new authentication standard, promises āThe beginning of the end of passwordsā. Interested to see how much adoption it will get. For many websites passwords are already not a thing, because they delegate authentication to 3rd-party identity providers (e.g. sign in with Facebook, sign in with Google)
š pnote goes Substack-first ! Previous newsletters were short-form versions of pnote.eu blog posts. With this edition, Iām trying the opposite formula: the content is first published via the newsletter, and later transcribed into a (potentially longer-format) blog post. This way as a newsletter subscriber, youāre the first to see the news :).
Postcard from AmsterdamĀ
Lots of rain in Amsterdam last weekend with occasional glimpse of sunshine and as always, very photogenic views on the channels.
Have a great week š«,
Ā ā Przemek