Isolated Repository for Multivariate Polynomial Regression
This is one of the codes that can have a much broader functionality than the specific area I usually work on. This will also serve as a test run before publishing more elaborate public repos. Leave feedback and open issues at the Github or Matlab File Exchange pages.
X is your Data matrix. 500 data point with 5 dimensions. Another way to look at this is 500 samples of 5 independent variables. Y is your observation vector 500 by 1. You want to find a good polynomial fit of columns of X to Y. Lets say you decided fit a 2nd degree polynomial to all 5 independent variables. And you are for the moment, interested in fitting the standard polynomial basis without further meddling with the terms.
load Example.mat reg=MultiPolyRegress(X,Y,2) % Gives you your fit.
reg = FitParameters: '-----------------' PowerMatrix: [21x5 double] Scores: [500x21 double] PolynomialExpression: [21x2 table] Coefficients: [21x1 double] yhat: [500x1 double] Residuals: [500x1 double] GoodnessOfFit: '-----------------' RSquare: 0.9392 MAE: 0.0334 MAESTD: 0.0481 Normalization: '1-to-1 (Default)' LOOCVGoodnessOfFit: '-----------------' CVRSquare: 0.9280 CVMAE: 0.0366 CVMAESTD: 0.0590 CVNormalization: '1-to-1 (Default)'
Different error definition ONLY in the calculation of MAE, MAESTD, CVMAE and CVMAESTD. Does not effect the fit.
reg = FitParameters: '-----------------' PowerMatrix: [21x5 double] Scores: [500x21 double] PolynomialExpression: [21x2 table] Coefficients: [21x1 double] yhat: [500x1 double] Residuals: [500x1 double] GoodnessOfFit: '-----------------' RSquare: 0.9392 MAE: 0.0293 MAESTD: 0.0313 Normalization: 'Range' LOOCVGoodnessOfFit: '-----------------' CVRSquare: 0.9280 CVMAE: 0.0313 CVMAESTD: 0.0346 CVNormalization: 'Range'
You would like to see a scatter plot of your fit.
You would like to limit the observed powers of certain terms in your polynomial. For example, you do not want the 1st and 4th Independent Variables (x1 and x4) to have second order terms (x1^2 or x4^2). Notice you have to explicitly write how high each term can go in powers, so I would also state I am fine with (x2 x3 and x5) having 2nd order terms.
reg=MultiPolyRegress(X,Y,2,[1 2 2 1 2]); PolynomialFormula=reg.PolynomialExpression
PolynomialFormula = Coefficient Term ___________ _______ 0.0022642 'x5' 0.0058919 'x4' -0.00049119 'x4.x5' 0.01644 'x3' -0.00098813 'x3.x5' 7.6129e-05 'x3.x4' 0.014969 'x2' -0.0023337 'x2.x5' 0.0028077 'x2.x4' -0.00012646 'x2.x3' -0.027613 'x1' -0.00036617 'x1.x5' -0.00043459 'x1.x4' -0.00011518 'x1.x3' -0.0009348 'x1.x2' 3.9964 '' 0.0004941 'x2^2' 0.00014775 'x3^2' 0.010017 'x5^2'
You have a new data point you would like to evaluate using the computed fit. Lets assume for the sake of argument that the 250th row of X is in fact a new data point.
Unless you have a stake in deeply understanding this code, don't try to make sense of the NewScores matrix, or what follows. I sometimes have to stare at it for a couple minutes to figure it out myself. I am happy discuss this in detail upon specific request.
You have to repeat this procedure for every new data point. It might be time saving to write a function that does this automatically, however I never needed this functionality, so I wouldn't count on me writing that.
NewDataPoint=X(250,:); NewScores=repmat(NewDataPoint,[length(reg.PowerMatrix) 1]).^reg.PowerMatrix; EvalScores=ones(length(reg.PowerMatrix),1); for ii=1:size(reg.PowerMatrix,2) EvalScores=EvalScores.*NewScores(:,ii); end yhatNew=reg.Coefficients'*EvalScores % The estimate for the new data point.
yhatNew = 5.2877
Unless you have a stake in deeply understanding this code, don't try to make sense of the Scores matrix, chances are you won't ever need to use it.
You would like to see the actual formula of the fit,
PolynomialFormula = Coefficient Term ___________ _______ 0.0052679 'x5' 0.0073888 'x4' -8.7941e-05 'x4.x5' 0.016723 'x3' -0.00097694 'x3.x5' 8.3902e-05 'x3.x4' 0.015417 'x2' -0.0025415 'x2.x5' 0.002392 'x2.x4' -0.00018939 'x2.x3' -0.028576 'x1' -0.00045571 'x1.x5' -0.00037732 'x1.x4' -0.00010521 'x1.x3' -0.00047654 'x1.x2' 4.0108 '' -4.9811e-05 'x1^2' -0.00026949 'x2^2' 0.00014575 'x3^2' -6.5765e-05 'x4^2' 0.011599 'x5^2'
This was shown earlier at the input examples.
This was shown earlier at the input examples.
This is the vector of estimates using your new fit. The scatter plot you see when you use the 'figure' option is generated using scatter(yhat,y).
This is defined as y-yhat. Can be used for a residual plot to see if ordinary least squares assumptions hold true.
These are useful not only in assesing the accuracy of your fit, but also comparing different candidates. For example, lets see how different powers compare for the same fit for the above dataset. I personally would like to use CVMAE as my comparative error measure, since it is more sensitive to overfitting.
It turns out, the second degree polynomial is the best option. One way to interpret this number is saying the fit makes in average a 3.66% error with respect to the original Y when estimating.
for ii=1:5 reg=MultiPolyRegress(X,Y,ii); CVMAE(ii)=reg.CVMAE; end CVMAE
CVMAE = 0.0383 0.0366 0.0416 0.1255 3.7904