2005-11-12

An exercise

Here is a little problem. Given an array A of N integers, find a sub-array (determined by a pair of indices within A) such that sum of elements in that sub-array is maximal among all sub-arrays of A.

Sounds pretty easy? It is, but just to set a little standard for solutions: there is a way (shown to me by 15-year-old boy) to do this in one pass over A and without using any additional storage except for few integer variables.

So much for dynamic programming...

update

That exercise is interesting, because while superfluously similar to the well-known tasks like finding longest common substring, etc. it can be solved in one pass. Indeed Matti presented exactly the same solution that I had in mind. And it is possible (and instructive) to get a rigorous proof that this algorithm solves the problem.
Definition 0
Let A = (i, j) be a sub-array (i <= j), then S(A) = S(i, j) denotes a sum of elements in A. S(A) will be called „sum of A

Definition 1
Let A = (i, j) be a sub-array (i <= j), then for any
0 <= p < i < q < j < r < N
sub-array (p, i - 1) is called left outfix of A, (i, q) --- left infix of A, (q + 1, j) --- right infix of A, and (j + 1, r) --- right outfix of A.

Definition 2
A sub-array is called optimal if all its infixes have positive sums, and all its outfixes have negative sums.

It's easy to prove the following:
Statement 0
A sub-array with the maximal sum is optimal.

Indeed, any non-optimal sub-array by definition has either negative infix, or positive prefix, and, hence, can be shrunken or expanded to another sub-array that has larger sum. As an obvious corollary we get:
Statement 1
To find a sub-array with the maximal sum it's enough to find a sub-array with the maximal sun among all optimal sub-arrays.

The key fact that explains why our problem can be solved in one pass is captured in the following:
Statement 2
Optimal sub-arrays are not overlapping.

Also easily proved by observing that when two sub-arrays A and B overlap there always is an sub-array C that is an infix of A and an outfix of B. And, now the only thing left to do is to prove that Matti's algorithm calculates sums of (at least) all optimal sub-arrays. Which is easy to do:
  • first prove that possibly except for the initial value, new_start always points to the index at which at least one (and, therefore, exactly one) optimal array starts. This is easily proved by induction.
  • then prove that once particular value of new_start has been reached, new_start is not modified until i reaches upper index of optimal array starting at new_start. This follows directly from the definition of optimal sub-array, because all its left infixes have positive sums.
  • this means that for any optimal sub-array (p, q) there is an iteration of the loop at which
    new_start == p && i == q
    As algorithm finds maximal sum among all (new_start, i) sub-arrays, it finds maximal sum among all optimal arrays, and, by Statement 1, maximal sum among all sub-arrays.
--Tags:-[]------------

12 comments:

  1. The idea is to keep track of the biggest sum and its index, and to compute the new sum by removing adding the difference between the first item of the last sub-array and the last of the next sub-array. Is that right?

    N elements, more or less 2*N operations
    5 temporary variables.

    bsh code...

    int n = 3;
    int[] a = { 1, 2, 3, 2, 3, 1, 3, 2, 3, 1, 2, 3, 3, 3, 1, 1, 2, 3 };



    int sum = 0;
    int i;
    for (i = 0; i < n; i++) {
    sum += a[i];
    }

    int index_max_sum = 0;
    int last_sum = sum;
    int max_sum = last_sum;
    for (i = 0; i < a.length - n - 1; i++) {
    last_sum = last_sum - a[i] + a[i + n];
    if (last_sum > max_sum) {
    index_max_sum = i + 1;
    max_sum = last_sum;
    }
    }

    System.out.println("max sum " + max_sum);
    System.out.println("index start sub-array " + index_max_sum);

    Did I miss something?

    ReplyDelete
  2. off by one...
    s/a.length - n - 1/a.length - n/

    ReplyDelete
  3. Did I miss something?

    Your solution only considers sub-arrays of length n (3 in your code). Original question was to find maximum among all possible sub-arrays, sorry if that wasn't clear enough.

    ReplyDelete
  4. C code:
    ----------

    void find_largest_sub(int *a, unsigned int len)
    {
    unsigned int i, start = 0, end = 0, new_start = 0;
    int max = -2147483648, sum = 0;

    for (i = 0; i < len; i++) {
    sum += a[i];

    if (sum >= max) {
    start = new_start;
    end = i;
    max = sum;
    }

    if (sum < 0) {
    sum = 0;
    new_start = i + 1;
    }
    }
    printf("Subarray ranges from %d to %d\n", start, end);
    }

    Seemed easy and worked in my few tests, but did I miss something?

    If there is need to sum big integers one could of course just use longer integers for the counters.

    ReplyDelete
  5. Of all possible sub arrays, there is one and only one of size N, and it seems to have the biggest sum, if no negative numbers are in the array to start with...

    subarray = (1,N)

    And if I remember properly, it's O(1), no need to pass over the array's elements.
    :P

    Or maybe I'm full of it, and didn't translate correctly the exercise to my native language.

    ReplyDelete
  6. to VinceLeg:

    if no negative numbers are in the array to start with

    that's a big "if" :-)

    ReplyDelete
  7. Not sure why I tried to make them with a fixed size...

    Matti's solution seems the right one.

    The algorythm idea comes from the observation that:

    with a[i, k] a sub-array of a[0, n] and sum(a[i, k]) the sum of the elements between index i and k included

    for all i, j, k € [0, n-1] such as i<=j<=k

    sum(a[i, k]) = sum(a[i, j - 1]) + sum(a[j, k])

    if sum(a[i, j - 1]) < 0
    then sum(a[j, k]) > sum(a[i, k]).

    [So many years I haven't done theorical maths, so feel free to correct me :) ]

    ReplyDelete
  8. Well, basically I came up with that solution by imagining the array as a xy-plot with the indexes running on the x-axis and the values on the y-axis. Then I noticed that if the current sum drops below zero, it is no use to keep summing from the previous "start index" as we would just start out with a negative value. Thus, we can just instead start a new sum from the next index. And so on until we find a positive integer again.

    Of course, we must still keep the previous "start index" in case we don't find any ranges with bigger sums later on.

    If all the integers in the array are negative, the range with the biggest sum consists of only the largest individual element.

    Not sure if Jerome's interpretation is correct since I'm too lazy to decipher it ;p

    ReplyDelete
  9. An update (with a proof of correctness) posted.

    ReplyDelete
  10. I was thinking on a simpler proof:

    Given a sub-array A(i,j) of A(0,n-1), we may prove that:

    S(0,n-1) = S(0,i-1) + S(i,j) + S(j+1, n-1)

    So, just find 'i' that minimizes S(0,i-1) (can be solved with one pass), and the j that minimizes S(j+1, n-1) (can be solved with one pass, too (by maximizing S(0,j))).

    It seems simpler to me, so we didn't need all the subarray-overlapping and outfix/infix logic.

    ReplyDelete
  11. Eduardo,

    your proof is correct (and elegant too), but it is a correctness proof for a different algorithm. Can you construct it? :-)

    ReplyDelete
  12. When trying to implement the algorith for my proof, I've noticed that I've not considered that i <= j, so the algorithm won't be that simple (I haven't found a way to assure i <= j without needing most parts of the original proof you presented).

    ReplyDelete