One of the main implications of the theory of evolution is that if a trait is prevalant among the majority of the human population now, it means individuals that possessed the trait during our evolutionary history enjoyed some advantage, in the process of survival and replication, over those who didn’t. This arguably obvious conclusion leads to some very non-obvious insights once we expand the definition of a “trait” to its full potential.

A commonly cited trait is the existence of opposable thumbs whose evolutionary advantages have been well explored in the past. Things become interesting once we start looking at behavioral traits instead of merely the physiological ones. For example, feeling hungry when starved is a behavioral trait found in almost every human being and its evolutionary advantages are clear. Sexual attraction is another easily explained universal trait.

We can also get into the specifics of hunger and explore exactly what kinds of food are coveted the most. In most cultures around the world, the foods most sought-after happen to be either sweet or deep-fried, i.e., high-calorie, high-sugar food. This seems to suggest that individuals that enjoyed high-calorie high-sugar foods in the past had a certain advantage over those who didn’t. And yet, a person in the modern world who survives on a diet of donuts, fried chicken, and pepsi doesn’t survive very long. So what’s going on?

The paleo-diet hypothesis is that fast-food companies have hacked into the hunger-controlling part of our brain in order to maximize their revenues. Evolution has hardwired into us a program that goes: “If host is lacking in macronutrient X, make him want to eat food rich in macronutrient X. Otherwise, feel satisfied about the food situation.” How does the brain judge if something is rich in a certain macronutrient? From its taste and the smell.

If you wanted to make food that would motivate this hunger-controlling program to give you anything in exchange of it, the kind of food you would make would taste and smell exactly like the food the program desired without actually being like that food in terms of the macronutrients it contained. As a result, the body would eat and eat, but never be satisfied. This is exactly the kind of food produced by the fast food companies.

What does all of this mean? It means that fast food is bad because it does not contain the amounts of macronutrients our body desires. But everyone knew that. What it also means is that fast food smells and tastes like food that was supposed to be healthy for us. So if we could somehow ensure that the food we ate was not deceptive, i.e., it contained exactly the kinds of things it made our brain believe it contained through its taste and smell, then healthy food would taste as good as the modern unhealthy fast food. Or, in other words, if we only eat non-deceptive food, then healthy food is the same as tasty food.


The ideal macronutrient ratio

There are reasons to believe that the diet ideal for the long-term health of an average human should consist of: 20 to 35 percent carbohydrates, 50 to 65 percent fats, and about 15 percent proteins (the percentages are by calorie).

Reason #1: That’s what all mammals consume.

Not exactly. Different mammals consume different proportions of macronutrients. However, the differences arise from differences in their digestive systems. Some have a bigger colon and some have a different set of gut bacteria. The difference results in different capabilities to convert one nutrient into another. However what remains constant is the ratio of macronutrients left once the food has gone through all the interconversion. So if some species seems to eat a lot of carbohydrates, it’s because its digestive tract has a more developed carbs-to-fats conversion mechanism. And what’s the ratio of macronutrients left after this preprocessing? 20 to 35 percent carbohydrates, 50 to 65 percent fats, and about 15 percent proteins.

But why should we assume that just because all mammals consume this ratio, this is the right ratio for us? Why should we believe that they have figured out the optimal diet for them when humans, a much more intelligent species, have not?

We do not need to attribute excellent optimization skills to the non-human mammals to conclude that they are doing things right. We only need to attribute the optimization skills to evolution. Any non-domesticated species has consumed pretty much the same kind of diet for the last several millions of years. This means evolution has optimized their body to respond well to the diet. That is, the mammals have not optimized their diet to suit their bodies; evolution has optimized the body to suit the diet! Once we transfer the burden of optimization to natural selection, the argument starts to make sense.

Humans also lived on a similar diet for millions of years in the paleolithic time, but things changed drastically once they invented agriculture.

Reason #2: That’s what the human body is composed of.

The human body (and the bodies of all mammals) consists of 20 to 35 percent carbohydrates, 50 to 65 percent fats, and about 15 percent proteins (see Perfect Health Diet) for details. Why does that mean this is the correct ratio for our diets as well?

Because throughout most of evolution, our diet mostly consisted of our own cells. We spent a lot of time fasting because food was not so readily available. And the only food our body consumed during the fasts was our own cells.

Reason #3: Human breast milk has a similar composition.

Again, not exactly. Breast milk is supposed to be for babies and babies have slightly different nutritional requirements than adults because of the accelerated growth of the brain during the early years of one’s life.

The human brain feeds on less fat than the rest of the body. Fats can be used by pathogens and viruses as carriers and thus if the brain fuelled itself with fat, it would be more susceptible to infection. This has led the brain to evolve means of fuelling itself with glucose and ketones. If we adjust for this difference, the breast milk composition is almost 20 to 35 percent carbohydrates, 50 to 65 percent fats, and about 15 percent proteins.

But why does the composition of breast milk have anything meaningful to say about the ideal composition of an adult diet? Because breast milk has been optimized by evolution to be the ideal diet, at least for babies. Adjust for the differences in the nutritional requirements of babies and adults, and you get a reasonable formula for an ideal adult diet.

The Paleo Diet

The paleo idea is that we should eat what our ancestors ate before the invention of agriculture. It’s based on sound evolutionary reasoning and fair bit of empirical evidence.

The evolutionary argument is that we lived for millions of years on a paleolithic diet and only 10,000 years on an agriculture-based diet. Keep any evolving species in one specific environment for millions of years and it will optimize itself to survive in it. This means our body is optimized to survive in an environment that feeds it a paleolithic diet. It hasn’t had enough time to adapt to the modern diet.

The empirical evidence comes in several forms. First, the fossilized skeletons show that pre-agriculture humans had healthier skeletons than the post-agriculture ones.

The tall stature and strong bones of Paleolithic skeletons indicate that Paleolithic humans were in remarkably good health. Paleolithic humans were tall and slender; cavities and signs of malnutrition or stress in bones were rare; muscle attachments were strong, and there was an absence of skeletal evidence of infections or malignancy.

(From Perfect Health Diet.)

Then, there is stuff we know about animals. For example, elephants that are exposed to a diet resembling the paleolithic era—i.e., wild elephants—live longer than elephants exposed to a modern diet—i.e., zoo elephants. The rate of obesity among pet cats and dogs is much higher than that among wild wolves and tigers. One might be tempted to attribute this to the fact that wild animals are more likely to be malnourished because of the difficulty associated with obtaining food. However, feral rats that live in cities and eat food discarded by humans are obese too.

Notes on Peter Thiel’s “Zero to One”

I just finished reading Peter Thiel’s new book “Zero to One” and I want to write something that can neither be called a book summary nor a book review. I just want to discuss a few ideas from the book that I liked, interspersed with my own commentaries, without meticulously stating which one is what. So I am just going to call it “notes” and accept an adversarial reader who assumes all good ideas to be the author’s and all bad ideas to be mine.

Monopolies are good. Competition is bad.

This was new to me since I always thought of competition as a sign of a healthy economy. Peter Thiel argues that in any sector with high competition, profits are low and all progress is incremental: sign of a stunted economy. The conventional argument against monopolies is the lack of a force that drives progress. If there’s no competition, then the monopoly may be able to do whatever it pleases, which includes not doing anything at all and still mooching money off the public. However, this argument is flawed since there does exist a progress-driving force: profit.

Deciding what to do is way more important than doing it well.

This is not new to me since I have come to this realization myself through experience. In the world of venture capital, a portfolio usually follows a power law: one company in any VC’s portfolio outperforms every other company in the portfolio combined. So if your strategy is to half-blindly diversify hoping to hit the good ones just based on probabilities, you are mostly going to miss the superstar companies. The only strategy that works in the VC world is to meticulously screen for only those companies that have a high chance of outperforming everyone else combined.

Peter Thiel takes this argument to the next level: the options available to you in any decision follow a power law. Thus doing many things just in order to increase the probability of hitting the good ones is never a good strategy. One needs to meticulously fish for the superstar options and devote all resources to them.

Solving the really big problems requires sales skills.

Barring the problems that can be solved by a PhD student in at most a few months, most problems can be solved more efficiently by a strategic allocation of human capital to it. Once a problem becomes big enough, a strategic allocation of human capital is the only way to solve it. And motivating the humans to work on the problem requires sales skills.


Wealth is not money

Money is just a kind of wealth. It is not wealth itself. Realizing this leads to some interesting insight.


If you know that a stock is definitely going to cost $50 more one month from now than it is now, you will buy tonnes of it now, sell it a month later, and make a definite profit. But remember that wealth is not money. Thus if you know that a stock and an apple cost the same right now but are definitely going to cost different in future, you can once again make a definite profit. So if the stock is going to cost double in future, you sell an apple today and buy a stock. A month from now, you sell the stock and buy two apples. You have just converted one apple to two apples in a month.


A country’s total import is always equal to its total export. Why?

If a person decides to, he can of course keep on buying things without ever selling anything. So if a person can do this, why can’t a country?

This one is a more subtle application of the wealth is not money principle. The point is to treat money as just another kind of wealth, i.e., to realize that every trade is a barter trade. When you are buying an apple for a dollar, you are selling a dollar for an apple. So there you go, even a person cannot buy without selling.

How to grow your wealth

So since everything one can own is some form of wealth, how does one grow it? There are two ways of doing it:

  1. Own more things.
  2. Somehow make people want the things you already own.

Similarly, there are two ways of losing wealth:

  1. Give away stuff you own.
  2. Make people hate the things you own.

The second point above restricted to money is called inflation. If you own a currency that people start disliking, you have been a victim of inflation.

This also means that being wealthy is a dynamic state and not a static one. Most people’s formula for being wealthy is to own a lot of money and stash it into a bank account. While this may work to a certain degree, it does not reflect reality. Being wealthy is really about owning lots of things that other people want. This is specially difficult since what other people want is a variable that changes constantly with time. Thus to really be wealthy, you need to constantly look out for people’s changing preferences and perform strategic trades to maintain a valuable portfolio.

Collecting coupons

Every time you buy a pizza from Pizza Hut, they give you a coupon selected at random from a set of {n} different coupons. How many pizzas do you need to buy on average in order to collect all possible coupons? We know from kindergarten that the answer is {\Omega(n\log n)}. Here’s an intuitive explanation why.

First, let’s look at a cool trick.

Suppose a biased coin with probability of showing heads to be {p} is being tossed repeatedly. How many times, on average, does it need to be tossed to get a heads for the first time? I promise this is related, and the relation will be clear soon.

Anyway, this is a standard geometric distribution and thus we can plug things into known formulas and get the value {1/p} for our answer. Intuitively, it does look like that should be the answer, since if we rolled a die with {1/p} faces repeatedly, we would expect to see face #1 after an average of {1/p} rolls. We can also derive the formula ourselves by writing down an infinite series and using standard summation tricks. But here’s a nice trick that can be used to get to the result immediately.

Let {E} be the expected number of tosses need. Then if the first toss shows a heads, we know that {E} is 1, and if the first toss shows a tails, we know that {E} is {1+E}. Writing this “in math” we get {E = p + (1-p)\cdot E}, which solves to {E = p}.

Anyway, back to coupon collection. Suppose we already have {x} coupons. What’s the probability that if we buy a pizza, we get a coupon that we do not already have? Clearly, {\frac{n-x}{n}}. Thus the expected number of pizzas needed to get one new coupon, from the previous calculations, is {\frac{n}{n-x}}. This means the expected number of pizzas required to collect all coupons can just be written as a the sum {\sum_{x=1}^{n-1} \frac{n}{n-x}}, which solves to {\Omega(n\log n)} after using the well-known approximation for the Harmonic number.

Which game should you play?

I give you a choice of two games to play.

In Game #1, I will keep a box in front of you that has 10 red and 10 blue balls. You will have to choose a color and then pick a ball randomly from the box. If the color you chose matches with the color of the ball you pick, I will give you $100; otherwise you will get nothing.

In Game #2, I will adversarially construct a box that will have some number of red and some number of blue balls. The rest of the game is the same. If, for example, I have a hunch that you are going to choose red as your color, I will make sure that the box contains no red balls so that you lose the game.

Which of the games should you pick? Clearly, since I am writing a blog post about this, the answer cannot be Game #1. So the answer is Game #2. But why?

Note that if you pick Game #2 and then choose your color uniformly at random from the set {red, blue}, this game becomes exactly like Game #1, no matter what the ratio of red and blue balls I adversarially choose to put in the box. Thus you can always achieve an expected profit of $50 by using this randomized strategy. In addition, if you know something about me, you can use that information to only enhance the performance. This means you should always pick Game #2.

So who won the debate?

A friend of mine got his driver’s license today. He was worried that he may not pass the driver’s test, but I kept saying he would. So his passing the test gives me a perfect opportunity to go all “I told you so!” on him. But mathematically speaking, am I justified in doing that?

Things are much easier in a deterministic world, or even in a world where all our wagers were deterministic. So let’s talk about that world for a while. Suppose your friend says he will definitely fail a test and you say he will definitely pass the test. Then it is very clear who won the debate once you know the outcome of the test. Of course, you win if your friend passes, he wins if he fails.

But the friend in question is mathematically more sophisticated. When I told him there was no need to worry and that he was going to pass the test, he didn’t say he was definitely going to fail. He said that there was a greater than 25% chance that he was going to fail.

Let’s assume, for simplicity, that I’d claimed his passing to be an absolute certainty. His claim estimated the probability of passing to a modest 75%. Now given that he did pass, who won this debate?

The answer is that it’s complicated. We can’t say that I won, because perhaps the true probability of his passing was indeed 75%, and this specific instance of the test happened to be drawn from the 75% of the instances where he does pass. Can we say that I lost? No, because perhaps the true probability was actually 100%.

The real answer is that in the middle of all these probabilities, we should not expect to have a definitive winner of the debate. Rather, all we should expect to extract from this event is a probability that I was the winner. A mathematically correct arbiter will start with an impartial prior probability about who’s the winner and use the outcome of the test to merely update this probability using the Bayes theorem.

I’m going to meet this friend in about 20 mins. Mathematically speaking, am I justified in saying, “I told you so!”? No. But am I going to do it? Yes.

Closed-form formulas are overrated

tl;dr A closed-form formula is a means of expressing a variable in terms of functions that we have got names for. The set of functions that we have got names for is a pure accident of human history. Thus having a closed-form formula for an object of study is also merely an accident of human history and doesn’t say anything fundamental about the object.

The essence of scientific investigation

Scientists like understanding things. A good test of understanding is the ability to predict. For example, we can claim that we have understood gravity because we can predict with amazing accuracy where the moon, for example, is going to be at any given time in the future.

In the next few paragraphs, I am going to want to make extremely general claims and that will require me to talk about some very abstract concepts. So let me talk about those abstract concepts first.

Most of the things that we have tried to understand in the history of scientific investigation can be thought of as an abstract number crunching device. The moon, for example, is something that we see in the sky at a particular angle at a given time. So we can think of the moon as a device that takes time as input and turns it into a particular position in the sky. We can denote time by a number t and the position by two numbers x and y. Thus moon converts t into x and y.

The number crunching that the moon does is not arbitrary. If one observes the moon for a while, it is easy to start seeing some patterns. Some obvious patterns are immediately visible. For example, there is a certain continuity in the way it moves, i.e., its position in the sky does not change too much in a short period of time. There are other very non-obvious patterns too. These patterns, in fact, required centuries of scientific investigation to uncover.

When we are trying to understand the moon, we are trying to understand this pattern. More precisely, we want to write down a set of rules that perform the same number crunching as the moon does, i.e., if we start with a t and apply those rules on t one by one, we get an x and a y whose values match exactly with the values that the moon’s number crunching would have given us. Now, I am not claiming that understanding the relationship between x, y and t tells us everything about the moon. Of course, it doesn’t say anything about whether there is oil on the moon’s surface. But let me just use “understanding the moon” as a metaphor in the rest of the article for understanding this specific aspect of the moon’s motion.

This is not specific to the moon, by the way. Consider some other subject of investigation. For example, the flu virus. One crude way of modelling the flu virus as a number crunching device is to say that it converts time into the expected number of people infected. That’s a very high level picture and we can make the model more informative by adding some more parameters to the input. For example, say, the average temperature that year, the humidity etc. The output can also be modified. We can, for example, make the output a vector of probabilities, where probability number i tells us how likely is it that person number i will get infected by the flu virus. There could be many ways of understanding the flu virus, but once we have asked one specific question about it, we have essentially modelled it as a number crunching device that converts some set of numbers into another set of numbers.

The main challenge of scientific investigation is that we do not usually have access to the inner workings of the number crunching device under investigation. In this sense, it is a black box. We only get to see the numbers that go in and the numbers that come out. Just by observing a large number of these input-output pairs, we take up the task of figuring out what’s going on inside the black box. We know that we have figured it out if we can replicate it, i.e., once we have constructed a set of our own rules that have the same behavior as the black box.

Things get interesting once we try to understand what kinds of rules we are allowed to write. For example, do we really have to write those rules? Is it fine if I hire a person who knows the rules and when given a time t, always outputs the correct x and y, the x and y that the moon itself would have churned out? Is it still fine if the person I have hired only understands the rules and can replicate the correct input-output behavior but can not explain the rules to me? If that is fine, then how about creating a machine, instead of hiring a person, that manifests the same input-output behavior in some way? For example, may be, the machine is simply a screen with a pointer and a dial so that when you set a specific t on the dial, the pointer moves to the correct x and y coordinates on the screen? Is that fine? Or may be, the machine is just a giant rock revolving around a bigger rock so that when a person standing on the bigger rock looks up at time t, he can see the smaller rock exactly at coordinates described by the corresponding numbers x and y?

I don’t know which of the scenarios above should be considered a “valid” understanding of the moon and which ones should not. But it seems clear that there can be several different ways of “writing” the set of rules. The primitive way of doing this was to write the set of rules as a closed-form formula.

What is a closed-form formula?

x = 2t + 1 is a closed-form formula. So is x = sin(t) + cos(t).

Until high school, I was under the impression that in order to understand the moon, one was required to present some such closed-form formula, i.e., express both x and y as functions of t. But that’s an unnatural constraint.

For example, what if x was a slightly weirder function? Say, x was 2t+1 for t < 1000 and sin(t) + cos(t) for t > 1000? May be we would still accept that, mainly because there exists a conventional way of writing such piecewise functions in math. But what if x was something even more weird? For example, say x was equal to the smallest prime factor of t? Or may be x was something that just cannot be written in one sentence? May be x was just given by a sequence of instructions based on the value of t, so that if you started with a value of t and followed those instructions one by one, you would end up with the value of x?

The punchline of this argument is that sin(t) (or even 2t+1, for that matter) is already such a set of instructions. Just because human beings, at some point, decided to give it a name doesn’t mean it is more fundamental than any other set of instructions for converting t into x. Thus in the process of understanding the moon, one should not worry about coming up with a closed-form formula.

At the same time, it is clear that some ways of writing the rules are better than others. For example, having a moon’s life-size replica revolve around the earth’s life-size replica as your set of rules is a bit inconvenient from the point of view of making predictions.

What, then, is the “correct” way of writing the rules? I want to claim that the answer to this question can be found by understanding computation and, specifically, the area of computational complexity. But I will not make this article any longer.

Euclidean Minimum Weight Matchings

The exact complexity of computing the minimum weight perfect bipartite matching in the Euclidean case is an open problem in computational geometry. This problem fits into the common theme of taking standard optimization problems on general weighted graphs and giving them a geometric flavor by forcing all the edge-weights to be Euclidean distances. Doing this often makes the problem easier to solve than the problem on general weighted graphs. Examples include minimum spanning tree (it’s open whether the Euclidean version can be done in linear time or not; the general version is known to take at least {\Omega(n\log n)} time) and the travelling salesman problem (the general version is hard to approximate, but the Euclidean case has a PTAS).

More formally, consider two sets {A} and {B} of {n} points each in the two dimensional plane. This defines a complete weighted bipartite graph where we create a node for each point in {A\cup B} and an edge {(a, b)} for all {a\in A} and {b\in B}. To each edge {(a, b)}, we assign a weight equal to the Euclidean distance between {a} and {b}. The question, then, is to compute the minimum weight perfect matching in this graph in {o(n^2)} time. Currently, the best known algorithm takes {\tilde{O}(n^2)} time where {\tilde{O}} hides logarithmic factors. If we don’t care about the accuracy, it is possible to reach almost linear time, that is, there exists a near linear time algorithm that finds a {(1+\epsilon)}-factor approximation for any {\epsilon > 0}. Getting a subquadratic approximation algorithm is a good sign because often approximation algorithms can be made exact by setting the {\epsilon} appropriately, if we know something about the solution space. For example, if we know that the set of all possible weights achievable by a perfect matching is an integer in the range {[1..n^2]}, we can get an exact solution by setting {\epsilon} to be something slightly smaller than {1/n^2}. Of course, this approach has obvious caveats, including a) that we do not know anything about the set of possible weights achievable by the perfect matchings and b) that setting {\epsilon} to be a polynomial in {1/n} will blow up the running time.

An interesting special case is when all the points are promised to belong to a {\Delta\times\Delta} integer grid. In this case an *additive* approximation algorithm is known that runs in {\tilde{O}(n^{3/2+\delta})} time, {\delta} being a small positive constant. Here the {\tilde{O}} hides logarithmic factors in {n} and {\Delta} and polynomial factors in {1/\epsilon}. From now on, we will also hide the {n^\delta} in the {\tilde{O}}.

Being on an integer grid has some advantages. For example, the weight of a perfect matching, then, is the sum of square roots of {2n} integers, each in the range {[0..\Delta]}. Sums of square roots of integers are, for many reasons, very interesting for the algorithms community and thus have been studied extensively. It is known, for example, that for any two sets of {n} integers each, the difference between the sum of square roots of the integers in one set and the sum of square roots of the integers in the other set is lower bounded by {1/f(n, \Delta)} where {f(n, \Delta)} is polynomial in {\Delta} but doubly exponential in {n}. That doesn’t quite help us yet, because setting {\epsilon} to be something doubly exponential in {n} is horrible for the running time.

In a recent paper by R. Sharathkumar, this problem was circumvented with a clever trick and a {\tilde{O}(n^{3/2})} time exact algorithm was shown for the case when points lie on a {\Delta\times\Delta} integer grid. The algorithm is really neat and works by combining a few ideas in the right way. One black box it uses is the fact that if instead of a complete bipartite graph in the two dimensional plane, you are given a planar graph, then the minimum weight perfect matching can be found using planar separators in {\tilde{O}(n^{3/2})} time. Thus his main idea is that given the complete bipartite graph, extract from it a subset of edges such that a) the subset is planar and b) it contains the minimum weight perfect matching of the complete bipartite graph. He shows that such a subset can be found in {\tilde{O}(n^{3/2})} time. To do this, he builds up on the additive approximation algorithm and uses the fact that sums of square roots of two sets of integers cannot be arbitrarily close to each other.