Perfect matchings with a high stabbing number

Once upon a time, an idiosyncratic king set up a peculiar system for settling marriages in his kingdom. Once every year, he would invite all the couples that wished to tie the knot to a grand ceremony. Upon arrival, the couples would be taken to a large open area with chairs that were fixed to the ground spread all around and asked to get seated. The arrangements would be so made that the chairs would neither be surplus nor in shortage. Thus each individual would get exactly one chair and no more.

Finally, the king himself would arrive, examine the seated guests, draw one long line on the ground and stand on one of its two sides. That’s when the marriages would be decided. Couples where both partners were seated on the side of the line the king stood on would get married and couples separated by the holy line would be forbidden from seeing each other ever again.

The king wanted to slow down the recent exponential growth in population in his kingdom and so he wanted as few couples to be married as possible. Since he had complete knowledge of who wanted to get married to whom, he could, in principle, devise an evil arrangement of chairs and draw one really mean line that would separate most of the couples at the ceremony. On the other hand, the couples were allowed to collude with each other upon seeing the arrangement of chairs and decide who got to sit where. Thus perhaps they could formulate a clever strategy that would let most of them be on the same side as the king no matter what line he chose to draw?

Year after year passed by and the king, drawing upon the wisdom of the entire royal ministry, managed to hoodwink his people and successfully stalled most of the romance in his kingdom. The lack of expertise in computational geometry among the general public proved to be detrimental to them. The grand ceremony, having the flavor of a gripping puzzle, got the king addicted and very soon, by developing progressively sophisticated and elaborate strategies, he unknowingly brought his own kingdom to what could be described as extinction.

Centuries later, in the year 1989, two researchers, trying to design an efficient data structure to perform range searching queries on a point-set proved an interesting theorem. They weren’t aware that the theorem held the key to a centuries old conundrum that could have saved an entire kingdom from going extinct. What they proved essentially amounted to this:

“No matter what the arrangement of chairs, the couples can always collude with each other and compute an assignment of chairs to each individual, so that no matter what line the king draws and no matter what side he stands on, at least a polynomial number of them get married.”

In fact, they proved something even stronger. Their theorem does not so much depend on the fact that the shape the king draws is a line. Other geometric shapes, such as circles, rectangles, squares, triangles, can all be plugged into the theorem in place of “line” and the statement will still hold true.

As long as the shape satisfies the property that its dual shatter function is polynomial, the theorem works. The dual shatter function for a shape is the maximum number of cells one can get in a Venn diagram obtained by drawing n of those shapes. For example, for the case of halfplanes (i.e., a line and one of its sides), one can easily show using induction on the number of lines that the dual shatter function is polynomial. Notice that when incrementally adding a halfplane to a partially built Venn diagram of halfplanes, the number of new cells created is equal to the number of cells this new halfplane’s boundary intersects. Since a new line can intersect an old line at most once, the number of cells it intersects is at most the number of lines already present. Thus the dual shatter function is O(n^2). Simiarly, for any shape that satisfies the property that boundaries of two instances of the shape always intersect in a constant number of places, the dual shatter function is bounded from above by a polynomial.

Actually, the theorem does not just hold in the geometric setting. It holds for general set systems. Thus if the ceremony were organized in interstellar space with chairs occupying co-ordinates in three dimensions, or in some bizarre abstract space, a polynomial number of marriages could be saved as long as the shape chosen by the king had a polynomial dual shatter function.

(Bonus points if you can correctly guess the definition of stabbing number without looking it up.)

Advertisement

Another theorem of Turán

A graph with {n} isolated vertices has a maximum independent set of size {n} and a complete graph has a maximum independent set of size 1. As you increase the number of edges, you should get smaller and smaller maximum independent sets.

This intuition is quantified by a theorem by Turán that says that a graph with {n} vertices and {e} edges has a maximum independent set of size at least {\frac{n^2}{2e+n}}.

In particular, graphs with linear number of edges, for example, planar graphs, or graphs with max. degree bounded by a constant are guaranteed to have a linear sized independent set.

Note that the theorem only says that small number of edges guarantees a large independent set. The converse is not true, i.e., a large independent set does not imply a small number of edges. Example: complete bipartite graphs. They have {\frac{n}{2}^2} edges and an independent set of size {n/2}.

Also, the theorem is constructive. So you can actually find the independent set in question in polynomial time.

Opinions and How They Change

An individual forms opinions based on how he can himself assimilate the facts around him and also based on what opinions his friends and other people he interacts with hold. This makes things complicated and intriguing enough that this has been an active area of research since decades.

One question is, can we formulate simple enough models that match with the data we get from real life experiments? If we could, then we would get some insight into human behavior and a tool for making useful predictions.

The simplest model that has been studied is this:

An opinion is just a real number. Each person starts with an initial opinion. Next, in each time step, he looks at the opinions held by his friends and updates his own opinion to the average of his old opinion and the opinions of his friends. It doesn’t have to be a simple average. A person may have different trusts for different friends and thus he might want to take a weighted average instead. However the model doesn’t allow individuals to change the weights at any step. The weights chosen in the beginning have to be the weights always.

Using simple linear algebra tricks and borrowing known results from the Markov Chain literature, it can be shown that this kind of system converges to an equilibrium in most natural cases. An equilibrium here means a set of opinions for which the averaging step doesn’t lead to any change, i.e., for all individuals, the new opinion remains the same as the old one. In fact, it can be proved that the equilibrium that’s reached is a consensus, i.e., every individual has the same opinion.

An objection to this model that one might have is the simple representation of an opinion. Can it really be represented by just a single real number?

Anyway, DeGroot, the person who introduced this model also showed that the same thing happens if the opinions are drawn from any vector space and in each time step a person updates his opinion to some convex combination of the opinions of his friends (including himself).

That’s something.

The only issue is that in real life, people don’t reach consensus. So what’s going on?

Of course, the model seems too simple to resemble real life accurately. For one, the weights (or trust) we assign to people changes over time depending on various factors. For example, if a person seems to be changing his mind every minute, we will probably assign a lesser weight to his opinion.

Also, even though this process of repeated averaging has been shown to always converge to a consensus, we don’t really know the amount of time it takes to reach there. From what I know by quickly glancing through Bernard Chazelle’s new work on bird flocking, the time taken by a community whose size is close to the population of a country to reach a consensus is probably way more than the age of the universe.

Anyway. Friedkin and Johnsen modified this model a bit to make it more realistic. In their model, an individual has a fixed internal opinion that doesn’t change with time and during an averaging step, he takes a weighted average of the opinions of his friends (including himself) and the fixed internal opinion. Because the internal opinion can be different for different people, this system will obviously not reach a consensus always.

The system does have an equilibrium though and Friedkin and Johnsen proved that the equilibrium is almost always reached.

However, their model is different from DeGroot’s simpler model in a fundamental way. Let me explain.

Given a set of numbers {a_1, \ldots, a_n}, the mean is the number that minimizes the function {(z-a_1)^2+\ldots +(z-a_n)^2}. Thus the averaging step above can be seen as a step where a person is trying to minimize the cost incurred with respect to the cost function {\sum_{j\in N(i)} (z_i-z_j)^2}. Here {N(i)} represents the neighborhood of {i}, i.e., the set of friends of {i}.

With the above definition of cost, we can measure the quality of a certain opinion vector. For example, we can say that the sum of costs incurred by each person is the social cost of the whole group. And then given an opinion vector, we can decide how good it is by measuring how far it is from the opinion vector that minimizes the social cost. In particular, we can measure the quality of the opinion vector that the group converges to in equilibrium.

The fundamental difference between DeGroot’s model and Friedkin and Johnsen’s model is that in DeGroot’s model, the equilibrium reached also minimizes the total social cost but in Friedkin and Johnsen’s model, it does not necessarily.

David Bindel, Jon Kleinberg and Sigal Oren prove in their FOCS ’11 paper that the situation is not that bad. Even though the total cost at equilibrium may not minimize the total social cost, it can be worse at most by a factor of 9/8. That’s pretty cool.

Two simple questions

In how many ways can you distribute {n} distinct objects into {k} different bins?

Of course, for each object, there are {k} possible bins, so the total number of ways is {k^n}.

Now, consider the same question, except, the objects are not distinct any more. Thus all that matters is how many of them go into each bin and not which ones in particular.

The answer to this can be slightly complicated to calculate, but if we simplify things a bit and look for a loose upper bound, we can argue as follows. The problem is equivalent to assigning one number between 0 and {n} to each bin so that the sum of all {k} assigned numbers is exactly {n}. If we completely ignore the fact that the sum must be {n}, then we get that the total number of ways of assigning numbers is {(n+1)^k}.

I am certain that I knew both these things back in high school, but it was quite recently that I realized that the first one is exponential in {n} and the second one is polynomial in {n}. That’s a huge difference. Just making the objects distinct changes the total search space from polynomial to exponential.

How hard is NP-hard?

Assuming P{\neq} NP, an NP-hard problem cannot be solved in polynomial time. This means that there cannot exist an algorithm that, for all possible inputs, computes the corresponding output in polynomial time. However, NP-harndess doesn’t prohibit the existence of an efficient algorithm for only a subset of the possible inputs. For example, there is a constant time algorithm for any problem that solves it for a constant number of instances. The algorithm just has a lookup table where the outputs of all the constant number of inputs are stored.

But this is an extreme case. Can there exist an algorithm for an NP-hard problem that solves it in polynomial time for a significantly large number of input instances? Or, more precisely, does there exist an algorithm, that, for all {n}, solves the problem in polynomial time for {f(n)} inputs of size {n}, where {f(n)} is some fast growing function of {n}?

How fast growing should {f(n)} be? One interesting choice for {f(n)} is something that makes the average case complexity of the algorithm polynomial, i.e., if inputs are chosen uniformly at random from the set of all inputs, then the algorithm takes polynomial time in expectation. Of course, {f(n)} will have to be fairly large for this to happen.

The interesting fact is that this is possible. For example, if you pick a graph on {n} vertices randomly, with a distribution that is uniform over all graphs with {n} vertices, then there exists an algorithm that decides whether the graph is hamiltonian or not in expected polynomial time.

Hamiltonian cycles in maximal planar graphs

Whitney proved long ago that every maximal planar graph with no separating triangles is Hamiltonian. However, the converse is not necessarily true, that is, there exist maximal planar graphs that are hamiltonian and have separating triangles. So what happens when a maximal planar graph has separating triangles? Is there still some condition under which we can say with certainty that it will be Hamiltonian as well?

One theorem of this kind was shown by Jackson and Yu.

Consider a maximal planar graph G with a separating triangle T. G can be decomposed into two graphs G_1 and G_2 such that G_1 \cup G_2 = G and G_1 \cap G_2 = T. If G_1 or G_2 still has separating triangles, then decompose them similarly. Eventually, we will be left with a collection S of maximal planar graphs without any separating triangles. Now, consider a graph whose nodes represent elements of the set S and there is an edge between two nodes if they share a separating traingle in G. Let’s call this graph D. The theorem by Jackson and Yu states that if D has maximum degree 3, then G is Hamiltonian.

This way of decomposing maximal planar graphs has been popular. It was first used by William Cunningham and Jack Edmonds in 1980. They proved several interesting results about the decomposition graph D.

Communication Complexity

Alice has a number x from the set [1..m] and Bob has a number y from the set [1..n]. They want to evaluate a function f(x, y) of the two numbers they have. The way they communicate with each other is as follows. Alice sends one bit to Bob, then Bob sends one bit to Alice, then Alice sends one bit to Bob and so on. Both of them have unlimited memory and unlimited processing power, so they can do extremely complicated things for deciding what bit they should send in each step. What’s the number of bits they will have to exchange in order to be able to evaluate f?

For example, say the function f was just x+y mod 2. In that case, all they would need is an exchange of 1 bit. Alice would send x mod 2 and that’s it.

The study of these kinds of questions was started by Yao in the paper titled “Some Complexity Questions Related to Distributive Computing.” However, he does mention an earlier paper that explored similar questions in the case when f was any smooth real-valued funtion as opposed to a discrete one. This post is about Yao’s paper. I am going to list down some of the results proved there.

Consider an m \times n matrix F that encodes the values of f, i.e., F[i, j] = f(i, j). Now consider any subset of rows and any subset of columns. The elements lying in a row of this subset and a column of this subset simultaneously form a rectangle, in the sense that there exists some permutation of the rows and the columns of the matrix for which the elements would lie inside a rectangular box. Let’s call such a rectangle monochromatic if f has a constant value for all elements of the rectangle. Let d(f) be the minimum number of monochromatic rectangles with which one can cover the whole matrix F. Then the first result in the paper says that d(f) affects the amount of bits that must be communicated between Alice and Bob if they want to figure out the value of f. In particular, the number of bits that must be communicated is \Omega(\log d(f)).

This is not very difficult to see if you understand what a communication protocol essentially is. In the communication protocol, each player basically gives some information in each step to narrow down the possibilities for what row or column he possesses. For example, before the first bit communicated by Alice, Bob has no idea about the row in which her number lies. But using the one bit of communication, she tells him, say, that her row number is either 1 or 3 or 5. Next, Bob does the same with his columns and so on. Finally, they know the value of the function once the remaining possibility forms a monochromatic rectangle.

This basically suggests a tree like structure where each step corresponds to an edge and the leaves corresspond to monochromatic rectangles. This means that the number of leaves in the tree is at least the number of monochromatic rectangles required to cover the whole matrix. And since at any step, there are a limited number of possible choices, the degree of the tree is bounded by a constant. Finally, since the communication complexity is nothing but the length of the longest path in the tree, the above result follows.

Next, the paper proves that out of all the functions possible between two  numbers in the range [1..n], most of them have d(f) = \Omega(n). This shows that for most functions, the communication complexity is \Omega (\log n).

What about the upper bound? Suppose we find a decomposition of the matrix into k monochromatic rectangles, does this give us a protocol that uses O(\log k) bits of communication? Theorem 2 from the paper says that it does as long as the decomposition satisfies the property that all its rectangles are contiguous. For general decompositions though, all that can be guaranteed is that there exists a protocol that uses \sqrt{k\log k} bits of communication.

Next, the paper gets into probabilistic models for communication complexity, but I will talk about that later.

Minimum enclosing cylinder

One of the most elegant algorithms I have seen in a while is Timothy Chan’s algorithm for finding the minimum enclosing cylinder of a bunch of points in the streaming model. The actual paper can be found here, in case you are interested.

The minimum enclosing cylinder of a set of points is basically the cylinder with minimum radius that encloses all the points in the set. It is equivalent to finding the straight line for which the most distant point in the set is the closest.

Streaming model is basically a model where you are allowed to look at the input exactly once and you are not allowed to store the input in your memory, i.e., you are allowed only sub-linear amount of memory. So in this particular case, it means that you are shown the points one by one and exactly once. You are allowed to do computations as and when you feel like. In the end, you must return the minimum enclosing cylinder of all the points that were shown.

Now, this definitely is a difficult task, in fact, it might even be impossible to return the exact minimum enclosing cylinder in such a restrictive model. Therefore, we allow some error. We say that it’s fine even if the radius of the cylinder that you return is at most some constant (that does not depend on the number of points) times the radius of the actual minimum enclosing cylinder.

Consider the following algorithm. Call the first point o and the second point v . Assume that this is the axis of your cylinder and whenever a new point is shown, calculate the distance of that point from the line ov . If this distance is bigger than the previous maximum, then update. Whatever you have in the end is the radius of the cylinder. Is this algorithm good? Or in other words, can we say that the radius that we have in the end is at most some constant times the radius of the actual minimum enclosing cylinder? No, we can’t. The reason is that it might turn out that v is really close to o and all the other points that we get are really really far away from ov, but are very close to the perpendicular bisector of ov. In that case, the cylinder with the perpendicular bisector of ov as its axis would have been a much better choice, and the cylinder that this algorithm is returning can be very large compared to this cylinder. So this algorithm is no good.

Notice however, that the problem was caused by all those points that were very far away from o. So if we can somehow make sure that there are no such points, then we could probably get somewhere. In fact, the paper I am talking about proves a more precise version of the statement above. In particular, it proves that if d(p, ov) is the distance between the point p and the line ov and if Rad(\{o, v, p\}) be the radius of the minimum enclosing cylinder of the points o, v, and p, then

d(p, ov) \leq 2 ( \frac{\|op\|}{\|ov\|} + 1 ) Rad(\{o, v, p\})

This says that if we fix the radius of the minimum enclosing cylinder of the points o, v, and p and the ratio  \frac{\|op\|}{\|ov\|}, then that gives an upper bound on the distance between p and ov. So if we somehow make sure that the ratio \frac{\|op\|}{\|ov\|} is small, say, bounded by 2, then that immediately tells us that the distance between p and ov is not more than 2(2+1) = 6 times the radius of the minimum enclosing cylinder of the points o, p, and v. Therefore the cylinder with ov as the axis won’t be that bad after all.

The only problem is that we don’t have such a bound on the ratio \frac{\|op\|}{\|ov\|}. One way out is to cheat and say that we will look at the input twice instead of once. In the first pass, we will find the point v that is farthest from the point o and in the second pass we will use the above algorithm with $ov$ as the axis. This will make sure that \frac{\|op\|}{\|ov\|} \leq 1 and therefore we will know that the radius we return will be at most 4 times the radius of the actual minimum enclosing cylinder.

The other solution is to not cheat, and use the awesome two line algorithm given in the paper that, in a way, does both passes in just one pass. The algorithm starts with calling the first point o, the second point v, and assigning the value 0 to the radius r initially. Next, whenever it encounters a new point p, it does the following.

  1. Find Rad(\{o, v, p\}). If it’s more than r, then update r, else do nothing.
  2. If \|op\| \geq 2\|ov\|, then change v to p, else do nothing.

The claim is that the final ov is a good axis to choose for the cylinder. To see why this is true, let’s say that v_1, v_2, \ldots , v_f are the different values that v gets and r_f is the final value of r. The points that came after v became v_f were obviously close to ov_f (when compared to r_f), because for all those points, the ratio \frac{\|op\|}{\|ov_f\|} was small. The points that came between v_{f-1} and v_f were close to ov_{f-1} for the same reason. Also, since \frac{\|ov_{f-1}\|}{\|ov_f\|} is also not more than 2, we know that v_{f-1} is close to ov_f (when compared to r_f). Therefore, using some geometry, one can prove that points that came between v_{f-1} and v_f were not just close to ov_{f-1}, but also to ov_f.  Similarly, points that came between v_1 and v_2 were close to v_3 and therefore were also close to v_4 and so on till ov_f. In fact, the paper proves that all the points are within a distance of 16 times r_f from ov_f. Therefore, the cylinder with ov_f as its axis and 16r_f as its radius contains all the points. The reason why it is a good cylinder to return is that the minimum enclosing cylinder has its radius at least as large as r_f.

Curves

If you have been given a bunch of points on a plane, finding the convex hull takes \Omega(n\log n) time. However, if you have been given, in addition, a non self-intersecting curve passing through all the points and the order in which the points occur along the curve, then the convex hull can be found in linear time. I find it a bit surprising that a non self-intersecting curve passing through all the points actually contains sufficient information to remove the \log n factor from the run-time of an algorithm. What are some other places where this information helps?

The Chuck Norris of Computational Geometry

(Note – This blog post is posted under the category TCS. If you are not working in computer science or mathematics, this post will not make sense to you. The same is going to apply to all future posts in this category)

I recently read a paper with the following abstract –

A number of seemingly unrelated problems involving the proximity of N points in the plane are studied, such as finding a Euclidean minimum spanning tree, the smallest circle enclosing the set, k nearest and farthest neighbors, the two closest points, and a proper straight-line triangulation. For most of the problems considered a lower bound of \Omega (N\log N) is shown. For all of them the best currently-known upper bound is O(N^2) or worse. The purpose of this paper is to introduce a single geometric structure, called the Voronoi diagram, which can be constructed rapidly and contains all of the relevant proximity information in only linear space. The Voronoi diagram is used to obtain O(N\log N) algorithms for all of the problems.

This feels a lot like a roundhouse kick.

PS: Note that the \Omega(N\log N) lower bound applies only to most of the problems. In particular, the smallest enclosing circle problem has a linear time algorithm.