CS 221 Fall 2011 -- Extra Credit Problem 2

CS 221 Fall 2011 Extra Credit Problem 2

Due: 23:59:59 Sunday, December 11
This problem is worth a maximum of 2 points (2%) of your final overall grade.

Automated Curve-Fitting

For this problem you will write a function dofits(x,y,k) to automatically test several forms of function to see which fits best a given set of data. Here x and y are column vectors, where the x's are the independent values and the y's are the dependent values; x and y must be the same length. k is the maximum degree of polynomial that should be fitted to the data. The functions fitted to the curve are: For each of the above types of curves, the function does the following:
  1. If necessary, transform the data by taking logarithms, so the form is a polynomial. For example, to fit to an exponential curve, the y values must be replaced by their natural logarithms.
  2. Call polyfit() on the (possibly transformed) data to deterine the coefficients of the polynomial.
  3. Call polyfit() again on the measured and predicted data, to determine the goodness-of-fit by comparing the (measured, predicted) pairs with the x=y line. So for example if the data is fitted to a polynomial, and the coefficients returned by the first call to polyfit() are in c, you would call polyfit(y,polyval(c,x),1) to get the slope and intercept of the best-fitting line. (Recall that "goodness of fit" can be quantified by the slope and intercept of the measured vs. predicted line, which should be close to 1 and 0, respectively).
  4. Print the computed parameters of each fitted curve (i.e., the coefficients for the polynomials, α and β for the exponential and power-law), along with the goodness-of-fit parameters (i.e., the coefficients returned by polyfit()) for each.
Requirements: Note: You may find the slides from this lecture helpful.

Example: For example, if you load this data set, set x = ecdata(:,1) and y = ecdata(:,2), and then call dofits(x,y,3), it should print:

degree 1: [ 0.199847 5.037114 ]
   goodness: 0.998814  0.053474

degree 2: [ -0.000004 0.201369 4.935181 ]
   goodness: 0.998818  0.053301

degree 3: [ -0.000000 0.000003 0.200215 4.973975 ]
   goodness: 0.998819  0.053283

exponential: beta=0.005606, alpha=12.162348;
   goodness: 1.185886, -7.713574

power law: beta=0.689275, alpha=1.191440;
   goodness: 0.841387, 5.983828
(The actual curve here is a line y = 0.2x + 5; the fit is not exact because noise has been added to the measurements. Note that when coefficients of higher powers in the polynomial are close to zero, a lower-degree polynomial is indicated, even if the "goodness of fit" is slightly better for the higher-degree one.)

For this data set, it should print:

degree 1: [ 17174.793999 -390883.931651 ]
   goodness: 0.831667  80200.898188

degree 2: [ 294.516124 -12571.334535 114800.253429 ]
   goodness: 0.994642  2552.792031

degree 3: [ 1.813078 19.834749 -1418.908110 18606.836039 ]
   goodness: 0.998609  662.759383

exponential: beta=0.060602, alpha=5035.903742;
   goodness: 1.337127, -111070.266422

power law: beta=1.715680, alpha=-66.611871;
   goodness: 0.449575, 92871.666940
(Actual equation for the above data is a cubic polynomial with parameters 2, -10, 8, and 100.) For this one, you should get:
degree 1: [ 78.547616 -259.912497 ]
   goodness: 0.991321  11.719844

degree 2: [ 0.677427 50.773115 -62.822336 ]
   goodness: 0.998946  1.423064

degree 3: [ -0.010890 1.347143 39.578267 -20.960165 ]
   goodness: 0.999143  1.157854

exponential: beta=0.086567, alpha=156.203270;
   goodness: 1.376870, -421.973310

power law: beta=1.282841, alpha=26.245491;
   goodness: 0.982327, 13.865161
Note that the "goodness of fit" measure is not foolproof, as the actual data in this case was generated by a power law, with α = 25 and β = 1.3 (y = 25x1.3), but the parabola and cubic polynomial both have much better goodness-of-fit.