One of the most elegant algorithms I have seen in a while is Timothy Chan’s algorithm for finding the minimum enclosing cylinder of a bunch of points in the streaming model. The actual paper can be found here, in case you are interested.
The minimum enclosing cylinder of a set of points is basically the cylinder with minimum radius that encloses all the points in the set. It is equivalent to finding the straight line for which the most distant point in the set is the closest.
Streaming model is basically a model where you are allowed to look at the input exactly once and you are not allowed to store the input in your memory, i.e., you are allowed only sub-linear amount of memory. So in this particular case, it means that you are shown the points one by one and exactly once. You are allowed to do computations as and when you feel like. In the end, you must return the minimum enclosing cylinder of all the points that were shown.
Now, this definitely is a difficult task, in fact, it might even be impossible to return the exact minimum enclosing cylinder in such a restrictive model. Therefore, we allow some error. We say that it’s fine even if the radius of the cylinder that you return is at most some constant (that does not depend on the number of points) times the radius of the actual minimum enclosing cylinder.
Consider the following algorithm. Call the first point and the second point
. Assume that this is the axis of your cylinder and whenever a new point is shown, calculate the distance of that point from the line
. If this distance is bigger than the previous maximum, then update. Whatever you have in the end is the radius of the cylinder. Is this algorithm good? Or in other words, can we say that the radius that we have in the end is at most some constant times the radius of the actual minimum enclosing cylinder? No, we can’t. The reason is that it might turn out that
is really close to
and all the other points that we get are really really far away from
, but are very close to the perpendicular bisector of
. In that case, the cylinder with the perpendicular bisector of
as its axis would have been a much better choice, and the cylinder that this algorithm is returning can be very large compared to this cylinder. So this algorithm is no good.
Notice however, that the problem was caused by all those points that were very far away from . So if we can somehow make sure that there are no such points, then we could probably get somewhere. In fact, the paper I am talking about proves a more precise version of the statement above. In particular, it proves that if
is the distance between the point
and the line
and if
be the radius of the minimum enclosing cylinder of the points
and
, then
This says that if we fix the radius of the minimum enclosing cylinder of the points and
and the ratio
, then that gives an upper bound on the distance between
and
. So if we somehow make sure that the ratio
is small, say, bounded by 2, then that immediately tells us that the distance between
and
is not more than
times the radius of the minimum enclosing cylinder of the points
and
. Therefore the cylinder with
as the axis won’t be that bad after all.
The only problem is that we don’t have such a bound on the ratio . One way out is to cheat and say that we will look at the input twice instead of once. In the first pass, we will find the point
that is farthest from the point
and in the second pass we will use the above algorithm with $ov$ as the axis. This will make sure that
and therefore we will know that the radius we return will be at most 4 times the radius of the actual minimum enclosing cylinder.
The other solution is to not cheat, and use the awesome two line algorithm given in the paper that, in a way, does both passes in just one pass. The algorithm starts with calling the first point , the second point
, and assigning the value
to the radius
initially. Next, whenever it encounters a new point
, it does the following.
- Find
. If it’s more than
, then update
, else do nothing.
- If
, then change
to
, else do nothing.
The claim is that the final is a good axis to choose for the cylinder. To see why this is true, let’s say that
are the different values that
gets and
is the final value of
. The points that came after
became
were obviously close to
(when compared to
), because for all those points, the ratio
was small. The points that came between
and
were close to
for the same reason. Also, since
is also not more than 2, we know that
is close to
(when compared to
). Therefore, using some geometry, one can prove that points that came between
and
were not just close to
, but also to
. Similarly, points that came between
and
were close to
and therefore were also close to
and so on till
. In fact, the paper proves that all the points are within a distance of 16 times
from
. Therefore, the cylinder with
as its axis and
as its radius contains all the points. The reason why it is a good cylinder to return is that the minimum enclosing cylinder has its radius at least as large as
.