Model Browser User's Guide    

Outlier Selection Criteria

You can select outliers as those satisfying a condition on the value of some statistic (for example, residual>3), or by selecting those points that fall in a region of the distribution of values of that statistic. For example, assume that residuals are normally distributed and select those with p-value>0.9. You can also select outliers using the values of model input factors.

The drop-down menu labeled Select using contains all the available criteria, shown in the following example.

The options available in this menu change depending on the type of model currently selected. The options are exactly the same as those found in the drop-down menus for the x- and y-axis factors of the scatter plots in the Model Browser (local level and global level views).

In the preceding example, the model selected is the knot response feature, so knot and Predicted knot appear in the criteria list, plus the global input factors; and it is a linear non-MLE model, so Cook's Distance and Leverage are also available.

The range of the selected criteria (for the current data) is indicated above the Value edit box, to give an indication of suitable values. You can type directly in the edit box. You can also use the up/down buttons on this box to change the value (incrementing by about 10% of the range).

Distribution

You can use the Distribution drop-down menu to remove a proportion of the tail ends of the normal or t distribution. For example, to select residuals found in the tails of the distribution making up 10% of the total area:

Residuals found in the tails of the distribution that make up 10% of the total area are selected. If you had a vast data set, approximately 10% of the residuals would be selected as outliers.

As shown, residuals found beyond the value of in the distribution are selected as outliers. is a measure of significance; that is, the probability of finding residuals beyond is less than 10%. Absolute value is used (the modulus) so outliers are selected in both tails of the distribution.

The t distribution is used for limited degrees of freedom.

If you select None in the Distribution drop-down menu, you can choose whether or not to use the absolute value. That is, you are selecting outliers using the actual values rather than a distribution. Using absolute value allows you to select using magnitude only without taking sign into account (for example, both plus and minus ranges). You can select No here if you are only interested in one direction: positive or negative values, above or below the value entered. For example, selecting only values of speed below 2000 rpm.

The Select using custom m-file check box enables the adjacent edit box. Here you can choose an m-file that selects outliers. Type the name of the file and path into the edit box, or use the browse button.

In this M-file you define a MATLAB function of the form:

function outIndices = funcname (Model, Data, Names)

Model is the current MBC model.

Data is the data used in the scatter plots. For example, if there are currently 10 items in the drop-down menus on the scatter plot and 70 data points, the data make up a 70 x 10 array.

Names is a cell array containing the strings from the drop-down menus on the scatter plot. These label the columns in the data (for example, spark, residuals, leverage, and so on).

The output, outIndices, must be an array of logical indices, the same size as one column in the input Data, so that it contains one index for each data point. Those points where index = 1 in outIndices are highlighted as outliers; the remainder are not highlighted.


  Outliers Menu (Local Level) Data Tab