Technical Documents (Model Browser User's Guide)

Model Browser User's Guide

Univariate Model Building Process Overview

For each response feature,

Begin by conducting a stepwise search.

You can do this automatically or by using the Stepwise window.

The goal of the stepwise search is to minimize PRESS. The precise nature of this process is discussed in future sections. What is important to appreciate about the output from this step is that usually not one but several candidate models per response features arise, each with a very similar PRESS R². The fact is that the predictive capability of a model with a PRESS R² of 0.91 cannot be assumed superior in any meaningful engineering sense to a model with a PRESS R² of 0.909. Further, the nature of the model building process is that the "improvement" in PRESS R² offered by the last few terms is often very small. Consequently, several candidate models can arise. You can store each of the candidate models and associated diagnostic information separately for subsequent review. Do this by making a selection of child nodes for the response feature.

However, experience has shown that a model with a PRESS R² of less than 0.8, say, is of little use as a predictive tool for engine mapping purposes. This criteria must be viewed with caution. Low PRESS R² values can result from a poor choice of the original factors but also from the presence of outlying or influential points in the data set. Rather than relying on PRESS R² alone, a safer strategy is to study the model diagnostic information in order to discern the nature of any fundamental issues and then take appropriate corrective action.

Once the stepwise process is complete, the diagnostic data should be reviewed for each candidate model.

It might be that these data alone are sufficient to provide a means of selecting a single model. This would be the case given that one model clearly exhibited more ideal behavior than the others. Remember that the interpretation of diagnostic plots is subjective.

You should also remove outlying data at this stage, using the mouse to select the offending point. You can set criteria for detecting outlying data. The default criterion is any case where the absolute value of the external studentized residual is greater than 3.
Given that outlying data has been removed, you might want to continue the model building process in an attempt to remove further terms.

This seems reasonable because high-order terms might have been retained in the model in an attempt to follow the outlying data. Even after removing outlying data, there is no guarantee that the diagnostic data will suggest that a suitable candidate model has been found. Under these circumstances,

A transform of the response feature might prove beneficial.

A useful set of transformations is provided by the Box and Cox family, which are discussed in the next section. Note that the Box-Cox algorithm is model dependent and as such is always carried out using the (Nxq) regression matrix X.

After you select a transform, you should repeat the stepwise PRESS search and select a suitable subset of candidate models.
After this you should analyze the respective diagnostic data for each model in the usual manner.

At this juncture it might not be apparent why the original stepwise search was carried out in the natural metric. Why not proceed directly to taking a transformation? This seems sensible when it is appreciated that the Box-Cox algorithm often, but not always, suggests that a contractive transform such as the square root or log be applied. There are two main reasons for this:
- The primary reason for selecting response features is that they possess a natural engineering interpretation. It is unlikely that the behavior of a transformed version of a response feature is as intuitively easy to understand.
- Outlying data can strongly influence the type of transformation selected. Applying a transformation to allow the model to fit bad data well does not seem like a prudent strategy. By "bad" data it is assumed that the data is truly abnormal and a reason has been discovered as to why the data is outlying; for example, "The emission analyser was purging while the results were taken."

Finally, if you cannot find a suitable candidate model on completion of the stepwise search with the transformed metric, then a serious problem exists either with the data or with the current level of engineering knowledge of the system. Model augmentation or an alternative experimental or modeling strategy should be applied in these circumstances.

After these steps it is most useful to validate your model against other data (if any is available). See Model Evaluation Window.

High-Level Model Building Process Overview Stepwise Regression Techniques