Model

The Cubist model class (cubist.Cubist or cubist.cubist.Cubist) has eleven parameters and eleven attributes available. Their use is demonstrated below and the class is documented in the API Docs.

A simple use of Cubist with no added configuration is as follows:

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from cubist import Cubist
>>> X, y = load_iris(return_X_y=True, as_frame=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...                                                     random_state=42,
...                                                     test_size=0.05)
>>> model = Cubist()
>>> model.fit(X_train, y_train)
Cubist()
>>> model.score(X_train, y_train)
0.9656775005204449
>>> model.score(X_test, y_test)
0.9955073453292975

Parameters

These are the values passed to the model at initialization.

Display Options

These parameters configure and enable printing Cubist’s model report.

target_label

The printed result includes a name for the target/output (y) value. The target_label parameter can be changed to something other than the default of outcome.

verbose

The verbose parameter indicates whether Cubist should print the generated model, summary, and training performance to the console. Either an integer or Python boolean is accepted.

Sample Verbose Output with Custom Target Label
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from cubist import Cubist
>>> X, y = load_iris(return_X_y=True, as_frame=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...                                                     random_state=42,
...                                                     test_size=0.05)
>>> model = Cubist(n_rules=2, verbose=True,
...                target_label="custom_output")
>>> model.fit(X_train, y_train)

Cubist [Release 2.07 GPL Edition]  ...
---------------------------------

    Target attribute `custom_output'

Read 142 cases (5 attributes)

Model:

  Rule 1: [48 cases, mean 0.0, range 0 to 0, est err 0.0]

    if
        petal width (cm) <= 0.6
    then
        custom_output = 0

  Rule 2: [94 cases, mean 1.5, range 1 to 2, est err 0.2]

    if
        petal width (cm) > 0.6
    then
        custom_output = 0.2 + 0.76 petal width (cm) + 0.271 petal length (cm)
                      - 0.45 sepal width (cm)


Evaluation on training data (142 cases):

    Average  |error|                0.1
    Relative |error|               0.16
    Correlation coefficient        0.98


        Attribute usage:
          Conds  Model

          100%    66%    petal width (cm)
                  66%    sepal width (cm)
                  66%    petal length (cm)


Time: 0.0 secs

Cubist(n_rules=2, target_label='custom_output', verbose=True)

Model Construction

These parameters control the model structure.

n_rules

Varying the n_rules parameter changes the maximum number of rules Cubist will generate for a model. Recall the definition of a rule from the Introduction.

n_committees

Varying the n_committees parameter changes the number of models (called committees) Cubist will generate. Recall the definition of a committee from the Introduction.

neighbors

Varying the neighbors parameter changes the number of nearest neighbors Cubist will use to correct the rule-based prediction. Using this feature may improve accuracy at the cost of interpretability as the linear models won’t be completely followed. Additionally the training dataset will be cached in the model to support future predictions.

unbiased

Toggling unbiased determines whether to allow the mean predicted value for the training cases covered by a rule to differ from their mean value. The default is to minimize the average absolute error.

extrapolation

Varying the extrapolation parameter changes the percentage outside of the output values seen in the training dataset to which Cubist can extrapolate.

Alternative Modes

These parameters control the mode in which the model is being used. The standard behavior is to train the model given the model tuning settings or their respective defaults.

auto

Cubist can be allowed to determine whether to introduce instance-based corrections with a composite model by leaving neighbors unset and setting auto=True. This feature may increase training time and may produce a warning with regards to Cubist’s recommendation. This can be effective as an initial experiment to consider the benefits of a composite model.

sample

When training on a large dataset, Cubist can subsample from the training dataset with the training percentage set by the sample parameter.

cv

random_state

Setting a value for random_state sets the random seed for Cubist to enable repeatable cross-validation and sampling.