World Cup Shuttle Ride
Probability; Difficulty: medium
World Cup Shuttle Ride
I’ve adapted and extended this puzzle from a Winkler question I came across some time ago. The counter intuitive answer makes this interesting. This submission was featured in the Dec 2nd Riddler Classic with Mexico replaced by the Netherlands since my prediction proved incorrect.
It is USA vs. Mexico in the quarterfinals of the FIFA World Cup in Qatar (a possibility and perhaps a prediction), renewing a famous soccer rivalry. The 11 Americans and 7 Mexicans fans at the Dusit Hotel aware that there’s no beer available at the stadium have spent quite some time at the hotel bar and can’t wait to get to the stadium. They have haphazardly put their room nos. down in no particular order on a big board by the Concierge desk for a hotel shuttle to ferry them to the game.
To avoid the possibility of any brawls between rival fans, shuttle drivers have been instructed to ferry the fans separately. To ensure this, a shuttle pulls up front and calls out room nos. one by one at random from the disorderly list on the board (presumably to create a semblance of following a queue) but only as long as they are fans of the same team. These fans’ room nos. are then crossed out on the board. Once they get a fan of the second team, they send that person back and the shuttle leaves with the fans of the single team aboard. The next shuttle then pulls up and repeats the process calling out room nos at random from those left on the board.
A: What is the probability that the last shuttle ferries American fans?
B: What is the expected number of shuttle buses/trips required to ferry the 18 fans?
*spoiler alert* solution and answer follow below
Solution to part A:
It might be tempting to assume that since there are more Americans than Mexicans, that the probability of the last shuttle carrying Americans is greater than 50% and is \(\displaystyle{\frac{11}{11+7}} = \displaystyle{\frac{11}{18}} \). This would be the right answer if there was just one single random ordering of the 18 fans. The key here is that once a choice results in an opposing team fan being selected, the process effectively starts again—with the smaller group of fans left! Consequently, the probability is the same 50% that the last shuttle ferries Americans as it does Mexicans. Here’s why:
Solution 1: It is clear that we will need 2 shuttles at the minimum. Now as we review any sequence of fans and shuttles, there has to be a point where only two more shuttles are needed and \(a \) American and \(m\) Mexican fans left. At this point, knowing that there are two more shuttles, the possible sequence is \(a\) Americans and then \(m\) Mexicans or vice versa. And since these behave as two blocks, the probability of each is the same. And since any real sequence of fans and shuttles has to reach a point with two shuttles needed, there is equal probability that the last shuttle carries Americans as Mexicans! Not convinced? There’s another proof below.
Solution 2 (Induction): That the probability of the last shuttle ferrying Americans is the same as last being Mexican, is obviously true when you have one of each i.e. \(a = 1\) and \( m = 1 \) . Assume this is true for a particular \(a, m\) and all values smaller than \(a,m >1\). Now consider case \(a+1,m\). What follows easily applies to the case \(a, m+1\).
Either the first shuttle picks up all \(a+1\) Americans or all \(m\) Mexicans resulting in a total of two trips OR the first shuttle picks up fewer, say \(k\) fans and hence more than 2 shuttles are required. In the first case, the probability is clearly 50% for Americans first and then Mexicans or vice versa. In the latter case, we are now re-starting the process with \(a+1-k,m\) OR \(a+1, m-k\) fans. By the induction hypothesis this also results in equal probability that the last shuttle ferries Americans or Mexicans. Hence it must be true for all \(a,m >=1\).
Solution to part B:
The solution to part B follows from recognizing that each time the sequence is interrupted with the random selection of an opposing team fan, we are back to the exact same process but with fewer total fans left than when we started. This suggests a recursive approach:
Let \(X_{a,m}\) represent the number of shuttle trips when we have \(a\) American and \(m\) Mexican fans and \(E_{a,m}\) be the expected number of shuttle trips so \(E_{a,m} = E(X_{a,m}) \). Also let \(A_{k,a,m}\) and \(M_{k,a,m}\) represent the event when \(k\) Americans or \(k\) Mexicans are in the first shuttle. We can now write \(E_{a,m}\) conditioning on the number of fans in the first shuttle:
\begin{align*}
E_{a,m} &= \sum_{k = 1}^{a}E(X_{a,m}|A_{k,a,m} ) P(A_{k,a,m}) \ + \ \sum_{k = 1}^{m}E(X_{a,m}|M_{k,a,m} ) P(M_{k,a,m}) \\
\end{align*}
But if the first shuttle picks up \(k\) American fans, then this means that the situation is the same as having 1 shuttle trip more than when starting with \(a-k,m\) American and Mexican fans and similarly for \(k\) Mexican fans. i.e.
\begin{align*}
E(X_{a,m}|A_{k,a,m}) &= (1\ +\ E_{a-k,m}) \\
E(X_{a,m}|M_{k,a,m}) &= (1\ +\ E_{a,m-k})
\end{align*}
Finally, \(P(A_{k,a,m})\), the probability of getting \(k\) Americans in the first shuttle is just the probability that we pick \(k\) Americans and then a Mexican from \(a\) Americans and \(m\) Mexicans. i.e.
\begin{align*}
P(A_{k,a,m}) &= \frac{a}{a+m}\cdot \frac{a-1}{a+m-1}\cdot \cdot \cdot \frac{a-k+1}{a+m-k+1}\cdot \frac{m}{a+m-k} \\
P(A_{k,a,m}) &= \frac{m}{a+m-k} \prod_{i=0}^{k-1}\frac{a-i}{a+m-i} \\
& \textit{and similarly} \\
P(M_{k,a,m}) &= \frac{a}{a+m-k} \prod_{i=0}^{k-1}\frac{m-i}{a+m-i}
\end{align*}
Where the \(\prod\) sign refers to pi notation for the product of a sequence
Finally, the expected number of shuttle trips when we have only one set of fans is obviously 1. Hence our final recursive relations are:
\begin{align*}
E_{a,m} &= \sum_{k = 1}^{a}\left ((1\ +\ E_{a-k,m})\frac{m}{a+m-k} \prod_{i=0}^{k-1}\frac{a-i}{a+m-i} \right ) \ +\ \sum_{k = 1}^{m}\left ((1\ +\ E_{a,m-k})\frac{a}{a+m-k} \prod_{i=0}^{k-1}\frac{m-i}{a+m-i} \right ) \\
E_{y,0} &= E_{0,y} = 1 \qquad y \geq 1
\end{align*}
The equations above look more formidable than they really are. Writing a simple program to use the recursive relation gets us the required expected number of trips:
\begin{align*}
E_{11,7} &\approx \boldsymbol{{\color{Red} {9.55}}}
\end{align*}
Sample code to calculate this as well as compare with the results of several simulated runs of the shuttles is given below.
import random
target_sims = 100000 #Traget runs for Monte Carlo sim
trip_count = 0
last_american = 0
last_mexican = 0
num_americans = 11
num_mexicans = 7
# Recursive function for Expected number of trips
def exp(a,m):
if a == 0 or m == 0:
return 1
exp_value = 0
for acount in range(1, a+1,1):
exp_value += probfirst(acount,0,a,m)*(1 + exp(a - acount,m))
for mcount in range(1,m+1,1):
exp_value += probfirst(0,mcount,a,m)*(1+ exp(a,m-mcount))
return exp_value
# Calculating the probability product
def probfirst(acount, mcount, num_a, num_m):
prob = 1
if acount > 0:
for i in range(0, acount,1):
prob = prob*((num_a-i)/(num_a + num_m-i))
prob = prob*num_m/(num_a + num_m - acount)
return prob
if mcount > 0:
for i in range(0, mcount,1):
prob = prob*((num_m-i)/(num_a + num_m-i))
prob = prob*num_a/(num_a + num_m - mcount)
return prob
print("theoretical expected trips = ", exp(num_americans, num_mexicans))
# Monte Carlo simulation
for i in range(0, target_sims,1):
usa = num_americans
mex = num_mexicans
while usa > 0 and mex > 0:
trip_count += 1
selection = random.randint(1, usa+mex)
if selection <= usa:
pick = 1
usa = usa - 1
else:
pick = 2
mex = mex - 1
next_pick = pick
while next_pick == pick:
selection = random.randint(1, usa+mex)
if selection <= usa:
next_pick = 1
else:
next_pick = 2
if next_pick == pick and next_pick == 1:
usa = usa -1
if next_pick == pick and next_pick == 2:
mex = mex - 1
trip_count += 1
if next_pick == 1:
last_american +=1
else:
last_mexican +=1
avg_shuttles = trip_count/target_sims
print("simulation avg trips = ", avg_shuttles, " fraction last shuttle has Americans = ", last_american/target_sims)
7 thoughts on “World Cup Shuttle Ride”
“Since there have to be at least two shuttles, consider the case conditioned on there being exactly two trips needed and 𝑎 American and 𝑚 Mexican fans left. Since we know there are only two trips needed (per the conditioning), we know the final sequence is 𝑎 then 𝑚 or vice versa. And these two must be equally likely! This is because these two sequences would have each been listed once in our list of equally likely sequences. ”
I am not sure to follow why they would be equally likely. Conditioned on the event that there are two sequences, if the bus driver picks “A” as a first member, then the first sequence must be all A’s. As I understand they just pick uniformly at random out of the list of room numbers, they do this with probability a / (a + m). Therefore, the sequence of A’s would have probability a / (a + m) of occurring first, conditioning on the fact there are exactly two sequences. What am I missing in thinking this way?
Thanks for the comment, Gabriel. Your description models a scenario where you pick the first person at random and say that is an american, you are then forced to pick all the a’s before picking up all the m’s. In this case, yes, the probability that you pick an a first is a/(a+m). But this is not the model of the puzzle. Here the conditional event is determined as follows: you have two groups (of potentially disparate size) a and m. pick one group. In this case the probability is half. To convince yourself, take a situation where you have 1 a and 2m. The only possible equally likely choices are a-m1-m2 ; a-m2-m1 ; m1-m2-a ; m2-m1-a. As you can see, there are exactly 2 choices each with one of the m’s last as m’s first.
in the case where you don’t re-scramble, the second case has a simple analytic form: 1 + 2ad/(a+d)
Thanks. That is an interesting result. Josh’s work can be seen here: https://joshmaxsilverman.github.io/2022-12-07-soccer-buses/
Sorry, but your proofs for part A are faulty. Let’s say there are 9 Americans and 1 Mexican. It’s obvious that the probability of the last shuttle carrying a Mexican is only 9/10, equal to the probability of the Mexican being picked last, and not 50%.
While all possible transfers involving 2 shuttles have an equal probability for the Mexican or the Americans going first, all transfers with 3 shuttles can only result in some Americans first – 1 Mexican – some Americans last. The last trip has to be with Americans. This disproves your assumption “And since any real sequence of fans and shuttles has to reach a point with two shuttles needed, there is equal probability that the last shuttle carries Americans as Mexicans”
The problem with your induction proof is that the induction hypothesis assumes the 50% probability being true for all values smaller or equal to a, m. But the case a +1, m – k is NOT covered by this assumption. Hence, the proof by induction is not valid.
Thanks for the comment, Wolfgang. It appears you might have misunderstood the question. Every time a shuttle leaves, the process re-starts with all the remaining fans. So it is entirely possible that two consecutive shuttles have Americans. So in the case where 3 shuttles are required, it is possible to have AMA, AAM, or MA. Perhaps re-reading the solution with this info will help. Alternatively you could try the simulation with the included code to convince yourself.
Re. Induction: if it is true for a,m (and all values a,m smaller than a,m) then it is shown to be true for a+1,m or a, m+1. So it is true for a+1,m-k where m is now just m-k.
Sorry, my fault. I had the wording of the Riddler version in mind where it’s not explicitly stated that a fan of a different nation is sent back to the bar, so i assumed he’d be the first on the next shuttle.