LLM vs 1500 theater shows

Can AI review a festival program thick like a phone book?

Jul 16, 2023

The biggest theatre festival in the world takes place every July in the south of France. It lasts a whole month, and brings together ~300 000 participants for 1500 daily shows in the beautiful medieval city of Avignon.

The scale of the festival is dazzling. With 1500 shows, even the most dedicated attendee won’t see even 10% of the program. I’m here for a week and I won’t even see 2%. Choosing is the name of the game, but how do you choose among so many shows?

If only there was a way to explain my preferences to a helpful, tireless advisor and then ask them to review all the shows for me.

Let’s ask GPT

Large Language Model–based tools like ChatGPT and Bard are good at language. We can explain the types of shows we like and dislike, and then ask for it’s advice about one specific show.

See for example:

I'm at a theatre festival and there are a lot of shows. I'd like to see:

 - elements of improv and audience participation
 - characters in their 30s searching for meaning in life
 - dystopian commentaries on society and technology

I'd like to avoid:

 - mass-appeal comedy
 - shows intended for children or seniors
 
Based on this, estimate whether I will like the show described below. Respond as a probability from 0% (no chance I will like it) to 100% (certain that I will like it).

"""
(description of the show from the official site)
"""

GPT gets it right:

for a contemporary show with elements of improv that I loved, it responds: It falls into the category of contemporary theatre with elements of improv and audience participation. Additionally, it features a character searching for meaning in life, I would estimate a probability of around 90% that you will like this show.
for a comedy play I wasn’t interested in, it responds: Based on your preferences, the probability that you will like the show described in the pitch is 10%.

Note: the descriptions of the shows I used are in French while the rest of the prompt is in English, the LLM doesn’t seem to care at all 💫.

Let’s crawl

The only remaining problem is that we can only ask ChatGPT or Bard about one show at a time. What we really want is to ask AI to review all 1500 shows and find the ones that look best for my preferences.

Ask and you shall receive doesn’t quite work. If we modify the prompt above:

(...) Based on this, can you review the program at https://www.festivaloffavignon.com/ and recommend some shows ?

then ChatGPT confesses its limitations: Unfortunately, as an AI, I don't have direct access to browse the internet or specific websites like the Festival of Avignon.

To review all the shows, we need to feed them to the LLM one by one ourselves.

To help with this, I wrote a small crawler that went through the official website and recorded the description of each show.

I then went through all the show descriptions, and sent them to GPT 3.5 for evaluation using the prompt above. Finally, I aggregated and sorted the results by the probability estimated by the LLM.

Results

The top plays selected by the ranker were (full results here):

Of these four, Penetrator seemed the most intriguing, and that’s the one I went to see.

It turned out to be a Scottish play of the in-yer-face genre (the most famous representative of which is Trainspotting). Two flatmates receive a surprise visit from an old friend who had spent the last few years in the military. Now he’s back, hiding from a mysterious “Penetrator” bent on hunting him down.

It may be a placebo effect, but I loved it.

Takeaways

What have we learned from this exercise?

That 1500 shows is a lot of shows. That models like GPT can understand a description of our preferences and evaluate a show description against them. That full end-to-end automation like crawling a website and evaluating each page of the program against our preferences is out of reach of the chatbot assistants for now.

For now… because as we’ve seen, the missing glue part of crawling the website and making GPT queries is not hard to automate using a few small programs. AI assistants capable of handling all of this may be on the horizon.

The gotcha is the cost. The ~1500 GPT queries I made to compile my ranking cost me a total of 1 USD. If we had to work with a 100x larger data set, that would be $100. So any AI system capable of doing this kind of bulk work automatically would likely need to come up with a way to inform the user of the cost of computation and charge accordingly.

Postcard from Avignon

After a few days of binging on theatre I’m looking forward to being back in Paris.

Have a great week 💫,
– Przemek

pnote

Discussion about this post

Ready for more?