Model¶

The Cubist model class (cubist.Cubist or cubist.cubist.Cubist) has eleven parameters and eleven attributes available. Their use is demonstrated below and the class is documented in the API Docs.

A simple use of Cubist with no added configuration is as follows:

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from cubist import Cubist
>>> X, y = load_iris(return_X_y=True, as_frame=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...                                                     random_state=42,
...                                                     test_size=0.05)
>>> model = Cubist()
>>> model.fit(X_train, y_train)
Cubist()
>>> model.score(X_train, y_train)
0.9656775005204449
>>> model.score(X_test, y_test)
0.9955073453292975

Parameters¶

These are the values passed to the model at initialization.

Display Options¶

These parameters configure and enable printing Cubist’s model report.

target_label¶

The printed result includes a name for the target/output (y) value. The target_label parameter can be changed to something other than the default of outcome.

verbose¶

The verbose parameter indicates whether Cubist should print the generated model, summary, and training performance to the console. Either an integer or Python boolean is accepted.

Model Construction¶

These parameters control the model structure.

n_rules¶

Varying the n_rules parameter changes the maximum number of rules Cubist will generate for a model. Recall the definition of a rule from the Introduction.

n_committees¶

Varying the n_committees parameter changes the number of models (called committees) Cubist will generate. Recall the definition of a committee from the Introduction.

neighbors¶

Varying the neighbors parameter changes the number of nearest neighbors Cubist will use to correct the rule-based prediction. Using this feature may improve accuracy at the cost of interpretability as the linear models won’t be completely followed. Additionally the training dataset will be cached in the model to support future predictions.

unbiased¶

Toggling unbiased determines whether to allow the mean predicted value for the training cases covered by a rule to differ from their mean value. The default is to minimize the average absolute error.

extrapolation¶

Varying the extrapolation parameter changes the percentage outside of the output values seen in the training dataset to which Cubist can extrapolate.

Alternative Modes¶

These parameters control the mode in which the model is being used. The standard behavior is to train the model given the model tuning settings or their respective defaults.

auto¶

Cubist can be allowed to determine whether to introduce instance-based corrections with a composite model by leaving neighbors unset and setting auto=True. This feature may increase training time and may produce a warning with regards to Cubist’s recommendation. This can be effective as an initial experiment to consider the benefits of a composite model.

sample¶

When training on a large dataset, Cubist can subsample from the training dataset with the training percentage set by the sample parameter.

cv¶

random_state¶

Setting a value for random_state sets the random seed for Cubist to enable repeatable cross-validation and sampling.