CS 221 Fall 2011 -- Extra Credit Problem 1

CS 221 Fall 2011 Extra Credit Problem 1

Due: 23:59:59 Sunday, December 11
This problem is worth up to 3 percentage points on your overall grade.

Finding Duplicate Data

For this problem you are to implement a function finddups() that takes one parameter, an array, and returns a vector containing each value that is present in every column of the array. You may assume that the input array has its columns sorted in increasing order, and that the values in each column are unique (i.e., no value is repeated in any a column).

For example, given the input array A:

A =
     1     0     4
     2     3     5
     3     4     6
     4    10     7

finddups(A) would return the 1x1 array (scalar) 4.

For another example, if A is this array:

A = 
     1     1     1     1
     3     2     4     2
     5     3     5     3
     7     4     10    4
     9     8     11    10
    10    10     12    11

finddups(A) would return the vector [ 1 10 ].

The finddups() function must not print anything. It must work for any nonempty array argument. In particular, if the argument has only one column, the function should return its argument.

For maximum credit (3 points), your solution should be as efficient as possible. For example, a naive solution that compares each element of in the first column to the elements in all the other columns in the array, and outputs those that are found in every column. This leads to on the order of (i.e., within a constant factor of) N² comparisons, where N is the total number of elements in the array. Such a solution will be worth at most 1 point. There are two more efficient approaches. One uses binary search to look for each value in the first row in all the other rows; it takes on the order of N log N comparisons, and would be worth at most 2 points.

The best approach (a correct implementation would be worth 3 points) looks at each element in the array at most once (order N comparisons), by keeping track of what numbers have been examined in each column. For example, in the first example above, the first value considered is 1. The algorithm looks at the first value in the second column; it is 0, which is less than the value sought, so the next value in the second column (i.e., 3) would be considered. This is bigger than 1, so we can conclude that 1 is not in every column. Note that we can also forget about 0, since it was not in column 1. So we "remember" that we have already looked at 1 in the first column and 0 in the second, and begin checking 3. We mark column 2 as the starting column and look in column 3. Its first element, 4, is already bigger than 3, so we conclude that 3 is not in every column. We advance our "position" in column 2, and start looking for 4. We "wrap around" and start in column 1 again, with 2. It is too small, so we consider 3, and so on. This solution only makes one pass over each column.

Hint: The function mod(x,y) may come in handy for "wrapping around" the column index from the last column to the first.

Submit the m-file finddups.m containing your function via the CS Portal.

Very Important: save the number the submission system gives after you upload your submission. It is the only acceptable proof that you uploaded your file.