Collecting coupons

Every time you buy a pizza from Pizza Hut, they give you a coupon selected at random from a set of {n} different coupons. How many pizzas do you need to buy on average in order to collect all possible coupons? We know from kindergarten that the answer is {\Omega(n\log n)}. Here’s an intuitive explanation why.

First, let’s look at a cool trick.

Suppose a biased coin with probability of showing heads to be {p} is being tossed repeatedly. How many times, on average, does it need to be tossed to get a heads for the first time? I promise this is related, and the relation will be clear soon.

Anyway, this is a standard geometric distribution and thus we can plug things into known formulas and get the value {1/p} for our answer. Intuitively, it does look like that should be the answer, since if we rolled a die with {1/p} faces repeatedly, we would expect to see face #1 after an average of {1/p} rolls. We can also derive the formula ourselves by writing down an infinite series and using standard summation tricks. But here’s a nice trick that can be used to get to the result immediately.

Let {E} be the expected number of tosses need. Then if the first toss shows a heads, we know that {E} is 1, and if the first toss shows a tails, we know that {E} is {1+E}. Writing this “in math” we get {E = p + (1-p)\cdot E}, which solves to {E = p}.

Anyway, back to coupon collection. Suppose we already have {x} coupons. What’s the probability that if we buy a pizza, we get a coupon that we do not already have? Clearly, {\frac{n-x}{n}}. Thus the expected number of pizzas needed to get one new coupon, from the previous calculations, is {\frac{n}{n-x}}. This means the expected number of pizzas required to collect all coupons can just be written as a the sum {\sum_{x=1}^{n-1} \frac{n}{n-x}}, which solves to {\Omega(n\log n)} after using the well-known approximation for the Harmonic number.


Which game should you play?

I give you a choice of two games to play.

In Game #1, I will keep a box in front of you that has 10 red and 10 blue balls. You will have to choose a color and then pick a ball randomly from the box. If the color you chose matches with the color of the ball you pick, I will give you $100; otherwise you will get nothing.

In Game #2, I will adversarially construct a box that will have some number of red and some number of blue balls. The rest of the game is the same. If, for example, I have a hunch that you are going to choose red as your color, I will make sure that the box contains no red balls so that you lose the game.

Which of the games should you pick? Clearly, since I am writing a blog post about this, the answer cannot be Game #1. So the answer is Game #2. But why?

Note that if you pick Game #2 and then choose your color uniformly at random from the set {red, blue}, this game becomes exactly like Game #1, no matter what the ratio of red and blue balls I adversarially choose to put in the box. Thus you can always achieve an expected profit of $50 by using this randomized strategy. In addition, if you know something about me, you can use that information to only enhance the performance. This means you should always pick Game #2.