# Intelligence and How to Get it

Some time ago, I read a book called Intelligence and How to Get it: Why Schools and Cultures Count, written by Richard E. Nisbett.

That’s when I became intelligent.

The book has some very interesting research about intelligence. It basically tries to argue that intelligence is not completely in the genes.

Anyway, while I was reading the book, I started writing up some interesting stuff that I read, and emailing it to some friends. The idea was of course, to archive things so that, you know, when 10 years from now, I start writing my New York Times bestseller, I have access to all the material I was reading.

Now, I realized that the emails can actually be shared online. So here is a pdf that has all the emails in this series. When I was writing those emails, I was not trying to make them into well organized essays. Thus this pdf looks like a collection of various independent thoughts. I have separated those indpendent thoughts with a line made of asterisks.

I did the same thing with some other books that I read too. I will post their respective pdf’s here as well.

# The New X-Men Movie

It’s annoying when someone who has never taken a Physics 101 course tries to make a sci-fi movie. Case in point is the new X-men movie – X-men first class. Let me explain.

My first problem is with Magneto. He is the superhero who can change himself into a magnet and pull or push things that are made of iron, or other magnetic materials at his will. All that is acceptable. What’s not acceptable is when he pulls a whole submarine out of the ocean. Even if we assume that somehow he is able to muster enough magnetic strength into his body to significantly move such a heavy object against the strong pull of gravitation, my problem is this – what happened to Newton’s third law of motion?

We were taught in high school that forces exist only in pairs called action-reaction pairs. That is to say, you cannot apply a force on another object independently. Whenever you try to do that, the other object will apply an equal and opposite force on you. For example, when the earth pulls us down, we pull the earth up with the same force. It’s just that since our mass is so insignificant compared to the mass of the earth, the force has a visible effect on us but very close to zero effect on the earth. And since we are distributed all over the surface, these already negligible effects cancel each other out to a large extent.

Now if we look at Magneto pulling the submarine out of the ocean with the sheer pull of his magnetic body, Newton’s third law would suggest that the submarine should also pull Magneto downwards with the same force. And since Magneto’s own mass is negligible compared to the mass of the submarine, the effect that you see on the submarine should be negligible compared to the effect you see on Magneto. What I am trying to say is that if Magneto can apply a force enough to pull the submarine out, he himself should fly at an enormous velocity towards the submarine and die within milliseconds of the impact.

You might argue that come on, he’s a superhero and so he can do everything. But the problem with this argument is that he actually has clearly defined powers. His power is to turn himself into a magnet. Honestly, if a superhero could violate Newton’s third law, then this power would be way more impressive than being able to turn himself into a magnet. He should have been called the third law violato or something, instead of Magneto.

My second problem is that dude who could fly simply by wearing wing-shaped clothes and screaming in an ultrasonic voice. In fact, I can’t fathom a reasonable scenario that could lead someone into thinking that these two things were related. The only thing that comes to mind is this – bats emit ultrasonic sounds and they can fly. But really? Is that all that you need to be convinced that you can fly by producing ultrasonic sounds? Human history is full of incidences where someone died because of an attempt at flying without thinking things through. Is this really all they were missing? Scream at the top of your voice while flying? I wonder why inspite of so much advancement in flying technologies, there does not exist a single aircraft that works on this ultra-sonic sound princple. It should be way easier than all that stuff the Wright brothers did! All we really need is a loose sweater and a sonographic equipment from the nearest hospital!

# Communication Complexity

Alice has a number $x$ from the set $[1..m]$ and Bob has a number $y$ from the set $[1..n]$. They want to evaluate a function $f(x, y)$ of the two numbers they have. The way they communicate with each other is as follows. Alice sends one bit to Bob, then Bob sends one bit to Alice, then Alice sends one bit to Bob and so on. Both of them have unlimited memory and unlimited processing power, so they can do extremely complicated things for deciding what bit they should send in each step. What’s the number of bits they will have to exchange in order to be able to evaluate $f$?

For example, say the function $f$ was just $x+y$ mod 2. In that case, all they would need is an exchange of 1 bit. Alice would send $x$ mod 2 and that’s it.

The study of these kinds of questions was started by Yao in the paper titled “Some Complexity Questions Related to Distributive Computing.” However, he does mention an earlier paper that explored similar questions in the case when $f$ was any smooth real-valued funtion as opposed to a discrete one. This post is about Yao’s paper. I am going to list down some of the results proved there.

Consider an $m \times n$ matrix $F$ that encodes the values of $f$, i.e., $F[i, j] = f(i, j)$. Now consider any subset of rows and any subset of columns. The elements lying in a row of this subset and a column of this subset simultaneously form a rectangle, in the sense that there exists some permutation of the rows and the columns of the matrix for which the elements would lie inside a rectangular box. Let’s call such a rectangle monochromatic if $f$ has a constant value for all elements of the rectangle. Let $d(f)$ be the minimum number of monochromatic rectangles with which one can cover the whole matrix $F$. Then the first result in the paper says that $d(f)$ affects the amount of bits that must be communicated between Alice and Bob if they want to figure out the value of $f$. In particular, the number of bits that must be communicated is $\Omega(\log d(f))$.

This is not very difficult to see if you understand what a communication protocol essentially is. In the communication protocol, each player basically gives some information in each step to narrow down the possibilities for what row or column he possesses. For example, before the first bit communicated by Alice, Bob has no idea about the row in which her number lies. But using the one bit of communication, she tells him, say, that her row number is either 1 or 3 or 5. Next, Bob does the same with his columns and so on. Finally, they know the value of the function once the remaining possibility forms a monochromatic rectangle.

This basically suggests a tree like structure where each step corresponds to an edge and the leaves corresspond to monochromatic rectangles. This means that the number of leaves in the tree is at least the number of monochromatic rectangles required to cover the whole matrix. And since at any step, there are a limited number of possible choices, the degree of the tree is bounded by a constant. Finally, since the communication complexity is nothing but the length of the longest path in the tree, the above result follows.

Next, the paper proves that out of all the functions possible between two  numbers in the range $[1..n]$, most of them have $d(f) = \Omega(n)$. This shows that for most functions, the communication complexity is $\Omega (\log n)$.

What about the upper bound? Suppose we find a decomposition of the matrix into $k$ monochromatic rectangles, does this give us a protocol that uses $O(\log k)$ bits of communication? Theorem 2 from the paper says that it does as long as the decomposition satisfies the property that all its rectangles are contiguous. For general decompositions though, all that can be guaranteed is that there exists a protocol that uses $\sqrt{k\log k}$ bits of communication.

Next, the paper gets into probabilistic models for communication complexity, but I will talk about that later.

# Textbook that teaches how to swim

I have been blogging at GoodBlogs. It’s a nice new website that gives you money if your blogs get upvoted to the main page. Here’s my most recent blogpost – http://www.goodblogs.com/view-post/Texbook-that-teaches-how-to-swim .

And here’s the text pasted from the original post –

Learning American history is different from learning how to swim. You can learn American history by reading a textbook about it, but I don’t think there exists someone who read a textbook on swimming, jumped into water and started to swim. Why are some skills so different from others in this respect? Wouldn’t it be awesome if all skills in the world could be learned just by reading a textbook? Can anything be done so that this does happen?

American history can be learned by reading a book because it mostly consists of facts and they can be easily described in a book. So any skill that consists of a collection of information can be transfered by encoding the information in some form (in this case, a book) and passing it on. Is swimming a collection of information too? If you think about it, it actually is. If you study in detail exactly what a swimmer does and exactly why he is able to swim, you will realize that his body is following certain rules. For example, if you feel some flow of water in this direction, then push water away in this direction, or, fold your legs a bit at the knees, then apply some force with the muscles in your thighs etc. Indeed, the brain is finally an information processing device. All it does is that it gets information from different sensory organs, uses it to decide upon an action and sends instructions to different parts of the body so that the action is executed. Whatever information processing the brain does while swimming is also just a piece of information, and, in principle, can be written down in the form of a book and transfered to others. But people learned swimming way before they understood the exact mechanism with which we swim. If you talk to an expert swimmer, he will most probably not know the exact mechanism either. He just somehow gets it. For him, the rules are all of the form, “when you feel this way, you should move these muscles this way,” where the ‘this’ are vaguely defined somewhere deep inside his brain.

So can a texbook for swimming be written? In principle, yes. You just describe the precise set of rules that your brain follows in excruciating details including the exact muscles it moves by exactly what amount when exactly what happens to the water around you. But this textbook will hardly be useful. If you read this textbook and memorize all the rules and jump into water, by the time you understand the force of the water stream and try to decide which muscles you are going to move, you will already be twenty meters deep into the water surprised about the fact that you haven’t had any oxygen in a long while now.

There are other skills that are similar to swimming in this sense. For example, consider learning to play the piano. Even Bach’s Toccata and Fugue in D minor can be written down on a piece of paper using weird symbols that encode the sequence in which you have to press the keys and for how long and how hard and so on. In principle, you could just pick up the staff notation and go through it and you would know exactly how to play the song. But once again, once you begin to understand what’s the next key to press, the audience will probably already lose track of which song you are playing and will start leaving for home.

The thing that’s common with these kinds of skills is that even though writing a textbook about them is useless, if you are allowed to slow time down, it might still be possible to learn the skill just by reading the textbook. For example, in the case of swimming, if somehow the laws of physics slowed down and let you think and calculate your next moves before applying gravity on you, you would be able to swim just by reading the textbook. Also, if the audience were a bit patient and did not lose track of the song so easily, it would be possible for you to just read the staff notation and play the song without any practice.

But now consider cooking. Suppose you want to learn enough cooking so that you can design recipes that you like. Can a textbook be written that gives you exactly this skill? The textbook will once again need to specify some rules in detail. For example, what exactly happens when you add ginger to a recipe? How exactly does it change the taste? Or what happens when you bake something in the oven instead of deep frying? The problem with these questions is that the answer just cannot be specified in a precise way. For example, what exactly is the difference between deep frying and baking? How do you explain this to someone who has never tasted food that’s deep fried or baked? How do you explain to someone the taste of anything if that person has not had that thing (or something similar) before? You can’t. You have to tell him that here, this is baked and this one is deep fried; taste and find out the difference. All other methods are just vague and not sufficient to teach cooking to someone. So in this sense cooking is different from both learning American history and learning how to swim.

So this seems to define three different categories of learnable skills – 1. those that can be taught through textbooks (eg., American history), 2. those that can be taught through textbooks only if one is allowed to slow down time (eg., swimming) and 3. those that just cannot be taught through textbooks (cooking).

Reducing the category number of a skill will be a good progress for science. For example, bringing some skill from category 3 to category 2 will be quite cool. How would one do that with, say, cooking? Well, we will need to make things precise. We will somehow need to quantify the different tastes and figure out exactly how much of which taste is contained in what ingredient. Then, a typical textbook on cooking will say things like, “one cubic centimeter of turmeric contains 3 units of spiciness, 4 units of something and 2 units of something else.” There will be tests for understanding what kinds of tastes you like. Then, whenever you want to cook something delicious, you will have to look at the book for measurements and pick something that brings the taste into the range that you like.

Bringing a skill from category 2 down to category 1 will be quite different. Well, if we are allowed to cheat by using machines, then we are already doing that. We have machines that can swim. We have submarines that can take us inside water and follow our instructions about where we want to go. And to teach these machines, one only needs to give them some information in the form of computer programs. If we are not allowed to cheat, then one will have to figure out some way so that our brains are able to do all the calculations really fast, so that we can decide which muscles to move in real time. We will somehow need to augment our brains, perhaps by building machines inside it, if not outside.

# The Fun Theory

Obeying traffic rules, throwing garbage in the garbage can, recycling stuff, not making too much noise in a residential area and switching off electrical appliances when not using them are all examples of things that will significantly improve human condition if everyone in the world starts following them. However, not many people do, even the ones who are convinced that following them will improve human condition in the long run. The reason is that we are all programmed by evolution to not care about the long run. We are all victims of hyperbolic discount. We take most of our decisions based on what will give us immediate rewards and what won’t. Next time you find yourself swearing at a person who’s driving like an idiot, think about all the hours of your life you have wasted procrastinating. You are very similar to that person in the sense that both of you are doing something that you shouldn’t be doing just because your brain isn’t capable enough to consider the big picture every time it’s making a decision.

So what’s the solution? You might think that the solution is simple. All that’s needed is that people should become intelligent enough to understand what they should do and then do that and that it’s a pity that they are not that intelligent. Unfortunately, this is not as much a solution as it is stating the problem in a different way, in the sense that solving this is as difficult as solving the original problem. If this is all you have had to say, then you have made no contribution. A real solution is to accept the fact that people’s (including your own) brains are limited and then design a system where people somehow get immediate rewards on doing good things. It’s not necessary to make the rewards monetory. It can be anything, even a simple pleasure. This is exactly what’s being done at http://www.thefuntheory.com/ and it’s totally awesome. I found out about them just now and I suddenly have too much respect for them.

# Minimum enclosing ball

Computational geometry is full of algorithms that demonstrate how randomization very often leads to simplification. One example is the famous randomized incremental algorithm for finding the minimum enclosing ball of a set of points in d-dimensions. The idea of the algorithm is very simple. All you need to do is to look at the points in a random order and maintain the minimum enclosing ball (MEB) of the points seen till now. When a new point is considered, if it already lies inside the current MEB, then there is no need to do anything, but if it lies outside, then update. With some math it can be shown that the expected amount of time this algorithm will take is linear. This is at par with the best known deterministic algorithm known at the time of discovery of this algorithm. However, the deterministic algorithm was quite complex and not easily implementable.

Here’s a very nicely written paper that describes the algorithm – http://www.springerlink.com/content/ph273841264x5163/

# Minimum enclosing cylinder

One of the most elegant algorithms I have seen in a while is Timothy Chan’s algorithm for finding the minimum enclosing cylinder of a bunch of points in the streaming model. The actual paper can be found here, in case you are interested.

The minimum enclosing cylinder of a set of points is basically the cylinder with minimum radius that encloses all the points in the set. It is equivalent to finding the straight line for which the most distant point in the set is the closest.

Streaming model is basically a model where you are allowed to look at the input exactly once and you are not allowed to store the input in your memory, i.e., you are allowed only sub-linear amount of memory. So in this particular case, it means that you are shown the points one by one and exactly once. You are allowed to do computations as and when you feel like. In the end, you must return the minimum enclosing cylinder of all the points that were shown.

Now, this definitely is a difficult task, in fact, it might even be impossible to return the exact minimum enclosing cylinder in such a restrictive model. Therefore, we allow some error. We say that it’s fine even if the radius of the cylinder that you return is at most some constant (that does not depend on the number of points) times the radius of the actual minimum enclosing cylinder.

Consider the following algorithm. Call the first point $o$ and the second point $v$. Assume that this is the axis of your cylinder and whenever a new point is shown, calculate the distance of that point from the line $ov$. If this distance is bigger than the previous maximum, then update. Whatever you have in the end is the radius of the cylinder. Is this algorithm good? Or in other words, can we say that the radius that we have in the end is at most some constant times the radius of the actual minimum enclosing cylinder? No, we can’t. The reason is that it might turn out that $v$ is really close to $o$ and all the other points that we get are really really far away from $ov$, but are very close to the perpendicular bisector of $ov$. In that case, the cylinder with the perpendicular bisector of $ov$ as its axis would have been a much better choice, and the cylinder that this algorithm is returning can be very large compared to this cylinder. So this algorithm is no good.

Notice however, that the problem was caused by all those points that were very far away from $o$. So if we can somehow make sure that there are no such points, then we could probably get somewhere. In fact, the paper I am talking about proves a more precise version of the statement above. In particular, it proves that if $d(p, ov)$ is the distance between the point $p$ and the line $ov$ and if $Rad(\{o, v, p\})$ be the radius of the minimum enclosing cylinder of the points $o, v,$ and $p$, then

$d(p, ov) \leq 2 ( \frac{\|op\|}{\|ov\|} + 1 ) Rad(\{o, v, p\})$

This says that if we fix the radius of the minimum enclosing cylinder of the points $o, v,$ and $p$ and the ratio  $\frac{\|op\|}{\|ov\|}$, then that gives an upper bound on the distance between $p$ and $ov$. So if we somehow make sure that the ratio $\frac{\|op\|}{\|ov\|}$ is small, say, bounded by 2, then that immediately tells us that the distance between $p$ and $ov$ is not more than $2(2+1) = 6$ times the radius of the minimum enclosing cylinder of the points $o, p,$ and $v$. Therefore the cylinder with $ov$ as the axis won’t be that bad after all.

The only problem is that we don’t have such a bound on the ratio $\frac{\|op\|}{\|ov\|}$. One way out is to cheat and say that we will look at the input twice instead of once. In the first pass, we will find the point $v$ that is farthest from the point $o$ and in the second pass we will use the above algorithm with $ov$ as the axis. This will make sure that $\frac{\|op\|}{\|ov\|} \leq 1$ and therefore we will know that the radius we return will be at most 4 times the radius of the actual minimum enclosing cylinder.

The other solution is to not cheat, and use the awesome two line algorithm given in the paper that, in a way, does both passes in just one pass. The algorithm starts with calling the first point $o$, the second point $v$, and assigning the value $0$ to the radius $r$ initially. Next, whenever it encounters a new point $p$, it does the following.

1. Find $Rad(\{o, v, p\})$. If it’s more than $r$, then update $r$, else do nothing.
2. If $\|op\| \geq 2\|ov\|$, then change $v$ to $p$, else do nothing.

The claim is that the final $ov$ is a good axis to choose for the cylinder. To see why this is true, let’s say that $v_1, v_2, \ldots , v_f$ are the different values that $v$ gets and $r_f$ is the final value of $r$. The points that came after $v$ became $v_f$ were obviously close to $ov_f$ (when compared to $r_f$), because for all those points, the ratio $\frac{\|op\|}{\|ov_f\|}$ was small. The points that came between $v_{f-1}$ and $v_f$ were close to $ov_{f-1}$ for the same reason. Also, since $\frac{\|ov_{f-1}\|}{\|ov_f\|}$ is also not more than 2, we know that $v_{f-1}$ is close to $ov_f$ (when compared to $r_f$). Therefore, using some geometry, one can prove that points that came between $v_{f-1}$ and $v_f$ were not just close to $ov_{f-1}$, but also to $ov_f$.  Similarly, points that came between $v_1$ and $v_2$ were close to $v_3$ and therefore were also close to $v_4$ and so on till $ov_f$. In fact, the paper proves that all the points are within a distance of 16 times $r_f$ from $ov_f$. Therefore, the cylinder with $ov_f$ as its axis and $16r_f$ as its radius contains all the points. The reason why it is a good cylinder to return is that the minimum enclosing cylinder has its radius at least as large as $r_f$.

# Curves

If you have been given a bunch of points on a plane, finding the convex hull takes $\Omega(n\log n)$ time. However, if you have been given, in addition, a non self-intersecting curve passing through all the points and the order in which the points occur along the curve, then the convex hull can be found in linear time. I find it a bit surprising that a non self-intersecting curve passing through all the points actually contains sufficient information to remove the $\log n$ factor from the run-time of an algorithm. What are some other places where this information helps?

# The Chuck Norris of Computational Geometry

(Note – This blog post is posted under the category TCS. If you are not working in computer science or mathematics, this post will not make sense to you. The same is going to apply to all future posts in this category)

I recently read a paper with the following abstract –

A number of seemingly unrelated problems involving the proximity of $N$ points in the plane are studied, such as finding a Euclidean minimum spanning tree, the smallest circle enclosing the set, $k$ nearest and farthest neighbors, the two closest points, and a proper straight-line triangulation. For most of the problems considered a lower bound of $\Omega (N\log N)$ is shown. For all of them the best currently-known upper bound is $O(N^2)$ or worse. The purpose of this paper is to introduce a single geometric structure, called the Voronoi diagram, which can be constructed rapidly and contains all of the relevant proximity information in only linear space. The Voronoi diagram is used to obtain $O(N\log N)$ algorithms for all of the problems.

This feels a lot like a roundhouse kick.

PS: Note that the $\Omega(N\log N)$ lower bound applies only to most of the problems. In particular, the smallest enclosing circle problem has a linear time algorithm.