One of the most elegant algorithms I have seen in a while is Timothy Chan’s algorithm for finding the minimum enclosing cylinder of a bunch of points in the streaming model. The actual paper can be found here, in case you are interested.
The minimum enclosing cylinder of a set of points is basically the cylinder with minimum radius that encloses all the points in the set. It is equivalent to finding the straight line for which the most distant point in the set is the closest.
Streaming model is basically a model where you are allowed to look at the input exactly once and you are not allowed to store the input in your memory, i.e., you are allowed only sub-linear amount of memory. So in this particular case, it means that you are shown the points one by one and exactly once. You are allowed to do computations as and when you feel like. In the end, you must return the minimum enclosing cylinder of all the points that were shown.
Now, this definitely is a difficult task, in fact, it might even be impossible to return the exact minimum enclosing cylinder in such a restrictive model. Therefore, we allow some error. We say that it’s fine even if the radius of the cylinder that you return is at most some constant (that does not depend on the number of points) times the radius of the actual minimum enclosing cylinder.
Consider the following algorithm. Call the first point and the second point . Assume that this is the axis of your cylinder and whenever a new point is shown, calculate the distance of that point from the line . If this distance is bigger than the previous maximum, then update. Whatever you have in the end is the radius of the cylinder. Is this algorithm good? Or in other words, can we say that the radius that we have in the end is at most some constant times the radius of the actual minimum enclosing cylinder? No, we can’t. The reason is that it might turn out that is really close to and all the other points that we get are really really far away from , but are very close to the perpendicular bisector of . In that case, the cylinder with the perpendicular bisector of as its axis would have been a much better choice, and the cylinder that this algorithm is returning can be very large compared to this cylinder. So this algorithm is no good.
Notice however, that the problem was caused by all those points that were very far away from . So if we can somehow make sure that there are no such points, then we could probably get somewhere. In fact, the paper I am talking about proves a more precise version of the statement above. In particular, it proves that if is the distance between the point and the line and if be the radius of the minimum enclosing cylinder of the points and , then
This says that if we fix the radius of the minimum enclosing cylinder of the points and and the ratio , then that gives an upper bound on the distance between and . So if we somehow make sure that the ratio is small, say, bounded by 2, then that immediately tells us that the distance between and is not more than times the radius of the minimum enclosing cylinder of the points and . Therefore the cylinder with as the axis won’t be that bad after all.
The only problem is that we don’t have such a bound on the ratio . One way out is to cheat and say that we will look at the input twice instead of once. In the first pass, we will find the point that is farthest from the point and in the second pass we will use the above algorithm with $ov$ as the axis. This will make sure that and therefore we will know that the radius we return will be at most 4 times the radius of the actual minimum enclosing cylinder.
The other solution is to not cheat, and use the awesome two line algorithm given in the paper that, in a way, does both passes in just one pass. The algorithm starts with calling the first point , the second point , and assigning the value to the radius initially. Next, whenever it encounters a new point , it does the following.
- Find . If it’s more than , then update , else do nothing.
- If , then change to , else do nothing.
The claim is that the final is a good axis to choose for the cylinder. To see why this is true, let’s say that are the different values that gets and is the final value of . The points that came after became were obviously close to (when compared to ), because for all those points, the ratio was small. The points that came between and were close to for the same reason. Also, since is also not more than 2, we know that is close to (when compared to ). Therefore, using some geometry, one can prove that points that came between and were not just close to , but also to . Similarly, points that came between and were close to and therefore were also close to and so on till . In fact, the paper proves that all the points are within a distance of 16 times from . Therefore, the cylinder with as its axis and as its radius contains all the points. The reason why it is a good cylinder to return is that the minimum enclosing cylinder has its radius at least as large as .