Wednesday, November 12, 2008

Analytical Approach To Solving Programming Problems

In the six months that passed since I updated this blog I've been working on various web application projects, learning a lot about ASP.Net Ajax and Web Client Software Factory. Nevertheless, this posting isn't about any particular technology. In my opinion, software developers already have way too many technologies, frameworks, programming languages, and APIs available to us. It's a challenge just to keep up with all the new stuff that comes out. What I want to discuss instead are the benefits of the analytical approach to programming problems.

Here is a sample problem. Imagine there is a virus spreading through the cells of a very large two-dimensional matrix. We start with a relatively healthy matrix with only 10 random cells infected. The virus is spreading by infecting 4 adjacent cells every minute. For example, if "." represents a healthy cell, this is how the epidemic will progress:







Start

.........
.........
....0....
.........
.........


After first minute

.........
....1....
...101...
....1....
.........


After second minute

.........
....2....
...212...
..21012..
...212...
....2....
.........


Of course, the virus starts spreading from 10 different places on the surface, so depending on where these cells are, the time it takes to infect entire matrix can vary. Our task is to find that time given 10 initial locations.

It may be tempting to rely on a raw processing power of modern computers and concoct a solution that looks like this:

while (!matrix_is_fully_infected)
{
infect_next_set_of_cells();
}

The model above simply recreates the behavior of the virus. The obvious drawback here is the sheer inefficiency of the algorithm: we end up scanning entire matrix an unknown number of times. As matrix size increase, the inefficiency will be more evident. Still, this may be a valid approach in some cases, where there is no easy analytical solution. Fortunately, our virus has a primitive DNA and yields itself to mathematical definition.

For simplicity, let's assume that we begin with a single infected cell with coordinates (a,b). The number of minutes it takes to infect an arbitrary cell (x,y) can be expressed with this simple formula: |a-x|+|b-y|. Now let's assume we had a second infected cell at the beginning: (c,d). We could use a similar formula to find out how many minutes it will need to infect our arbitrary cell (x,y): |c-x|+|d-y|.

Depending on whether (a,b) or (c,d) is located closer to (x,y), one of the above expressions will produce a smaller number of minutes. This will be the answer to the question "how long it takes to infect a single arbitrary cell". As we go from 2 infected cells to the original 10, we can write the answer as a function of (x,y):

min(|ai - x| + |bi - y|), where 1 <= i <= 10

Of course, our job is not done yet - the virus doesn't stop until all cells are infected. What we need to find out is how many minutes it will take to infect the last cell. Evidently, this will be the maximum time across the matrix, so our solution will be to take the maximum of the above function:

max( min(|ai - x| + |bi - y|) ), where x and y vary across matrix dimensions

As you can easily see, analytical approach provided significant performance improvement - we now only need to scan the matrix once.