# Collecting coupons

Every time you buy a pizza from Pizza Hut, they give you a coupon selected at random from a set of ${n}$ different coupons. How many pizzas do you need to buy on average in order to collect all possible coupons? We know from kindergarten that the answer is ${\Omega(n\log n)}$. Here’s an intuitive explanation why.

First, let’s look at a cool trick.

Suppose a biased coin with probability of showing heads to be ${p}$ is being tossed repeatedly. How many times, on average, does it need to be tossed to get a heads for the first time? I promise this is related, and the relation will be clear soon.

Anyway, this is a standard geometric distribution and thus we can plug things into known formulas and get the value ${1/p}$ for our answer. Intuitively, it does look like that should be the answer, since if we rolled a die with ${1/p}$ faces repeatedly, we would expect to see face #1 after an average of ${1/p}$ rolls. We can also derive the formula ourselves by writing down an infinite series and using standard summation tricks. But here’s a nice trick that can be used to get to the result immediately.

Let ${E}$ be the expected number of tosses need. Then if the first toss shows a heads, we know that ${E}$ is 1, and if the first toss shows a tails, we know that ${E}$ is ${1+E}$. Writing this “in math” we get ${E = p + (1-p)\cdot E}$, which solves to ${E = p}$.

Anyway, back to coupon collection. Suppose we already have ${x}$ coupons. What’s the probability that if we buy a pizza, we get a coupon that we do not already have? Clearly, ${\frac{n-x}{n}}$. Thus the expected number of pizzas needed to get one new coupon, from the previous calculations, is ${\frac{n}{n-x}}$. This means the expected number of pizzas required to collect all coupons can just be written as a the sum ${\sum_{x=1}^{n-1} \frac{n}{n-x}}$, which solves to ${\Omega(n\log n)}$ after using the well-known approximation for the Harmonic number.

In Game #1, I will keep a box in front of you that has 10 red and 10 blue balls. You will have to choose a color and then pick a ball randomly from the box. If the color you chose matches with the color of the ball you pick, I will give you $100; otherwise you will get nothing. In Game #2, I will adversarially construct a box that will have some number of red and some number of blue balls. The rest of the game is the same. If, for example, I have a hunch that you are going to choose red as your color, I will make sure that the box contains no red balls so that you lose the game. Which of the games should you pick? Clearly, since I am writing a blog post about this, the answer cannot be Game #1. So the answer is Game #2. But why? Note that if you pick Game #2 and then choose your color uniformly at random from the set {red, blue}, this game becomes exactly like Game #1, no matter what the ratio of red and blue balls I adversarially choose to put in the box. Thus you can always achieve an expected profit of$50 by using this randomized strategy. In addition, if you know something about me, you can use that information to only enhance the performance. This means you should always pick Game #2.