This is onelearn’s documentation

https://travis-ci.org/onelearn/onelearn.svg?branch=master Documentation Status PyPI - Python Version PyPI - Wheel GitHub stars GitHub issues GitHub license https://coveralls.io/repos/github/onelearn/onelearn/badge.svg?branch=master

onelearn stands for ONE-shot LEARNning. It is a small python package for online learning with Python. It provides :

  • online (or one-shot) learning algorithms: each sample is processed once, only a single pass is performed on the data
  • including multi-class classification and regression algorithms
  • For now, only ensemble methods, namely Random Forests

Usage

onelearn follows the scikit-learn API: you call fit instead of partial_fit each time a new bunch of data is available and use predict_proba or predict whenever you need predictions.

from onelearn import AMFClassifier

amf = AMFClassifier(n_classes=2)
clf.partial_fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:, 1]

Each time you call partial_fit the algorithm updates its decision function using the new data as illustrated in the next figure.

_images/iterations.pdf

Installation

The easiest way to install onelearn is using pip :

pip install onelearn

But you can also use the latest development from github directly with

pip install git+https://github.com/onelearn/onelearn.git

Where to go from here?

To know more about onelearn, check out our example gallery or browse through the module reference using the left navigation bar.

Classification

For now, onelearn provides mainly the AMFClassifier class for multi-class classification. Its usage follows the scikit-learn API, namely a partial_fit, predict_proba and predict methods to respectively fit, predict class probabilities and labels. The AMFClassifier with default parameters is created using

from onelearn import AMFClassifier

amf = AMFClassifier(n_classes=2)

where n_classes must be provided for the construction of the object. Also, a baseline dummy classifier is provided by OnlineDummyClassifier, see below.

onelearn.AMFClassifier(n_classes[, …]) Aggregated Mondrian Forest classifier for online learning.
onelearn.OnlineDummyClassifier(n_classes[, …]) A dummy online classifier only using past frequencies of the labels.

About AMFClassifier

The AMFClassifier class implements the “Aggregated Mondrian Forest” classifier for online learning (see reference below). This algorithm is truly online, in the sense that a single pass is performed, and that predictions can be produced anytime.

For multi-class classification with \(C\) classes, we observe, before time \(t\), pairs of features and labels \((x_1, y_1), \ldots, (x_{t-1}, y_{t-1})\) where \(y_s \in \{ 1, \ldots, C \}\) for each \(s = 1, \ldots, t-1\).

Each node in a tree predicts according to the distribution of the labels it contains. This distribution is regularized using a Dirichlet (a.k.a “Jeffreys”) prior with parameter \(\alpha > 0\) which corresponds to the dirichlet parameter in AMFClassifier. Each node \(\mathbf v\) of a tree predicts, before time \(t\), the probability of class \(c\) as

\[\widehat y_{\mathbf v, t} (c) = \frac{n_{{\mathbf v}, t} (c) + \alpha}{t + C \alpha}\]

for any \(c = 1, \ldots, C\), where \(n_{{\mathbf v}, t}(c)\) is the number of samples of class \(c\) in node \(\mathbf v\) before time \(t\). This formula is therefore simply a regularized version of the class frequencies.

Each node \(\mathbf v\) in the tree corresponds to a cell denoted \(\mathrm{cell}({\mathbf v})\), which corresponds to a hyper-rectangular subset of the features space. The predictions of a node, before time \(t\), are evaluated by computing its cumulative loss as

\[L_{\mathbf v, t} = \sum_{1 \leq s \leq t \,:\, x_s \in \mathrm{cell}(\mathbf v)} \ell (\widehat y_{\mathbf v, s}, y_s),\]

which is the sum of the prediction losses of all the samples whose features belong to \(\mathrm{cell}({\mathbf v})\). By default, we consider, for multi-class classification, the logarithmic loss \(\ell (\widehat y, y) = - \log (\widehat y(y))\) for \(y \in \{ 1, \ldots, C \}\). The loss can be changed using the loss parameter from AMFClassifier (however only loss="log" is supported for now).

Given a vector of features \(x\) and any subtree \(\mathcal T\) of the current tree, we define \(\mathbf v_{\mathcal T}(x)\) as the leaf of \(\mathcal T\) containing \(x\) (namely \(x\) belongs to its cell). The prediction at time \(t\) of the subtree \(\mathcal T\) for \(x\) is given by

\[{\widehat y}_{\mathcal T, t} (x) = {\widehat y}_{\mathbf v_{\mathcal T} (x), t},\]

namely the prediction of \(\mathcal T\) is simply the prediction of the leaf of \(\mathcal T\) containing \(x\). We define also the cumulative loss of a subtree \(\mathcal T\) at time \(t\) as

\[L_{t} (\mathcal T) = \sum_{s=1}^t \ell ({\widehat y}_{\mathcal T, t} (x_s), y_s).\]

When use_aggregation is True (the highly recommended default), the prediction function of a single tree in AMFClassifier is given, at step \(t\), by

\[\widehat {f_t}(x) = \frac{\sum_{\mathcal T} \pi (\mathcal T) e^{-\eta L_{t-1} (\mathcal T)} \widehat y_{\mathcal T, t} (x)}{\sum_{\mathcal T} \pi (\mathcal T) e^{-\eta L_{t-1} (\mathcal T)}},\]

where the sum is over all subtrees \(\mathcal T\) of the current tree, and where the prior \(\pi\) on subtrees is the probability distribution defined by

\[\pi (\mathcal T) = 2^{- | \mathcal T |},\]

where \(|\mathcal T|\) is the number of nodes in \(\mathcal T\) and \(\eta > 0\) is the learning rate that can be tuned using the step parameter in AMFClassifier (theoretically, the default value step=1.0 is the best, and usually performs just fine).

Note that \(\pi\) is the distribution of the branching process with branching probability \(1 / 2\) at each node of the complete binary tree, with exactly two children when it branches. This aggregation procedure is a non-greedy way to prune trees: the weights do not depend only on the quality of one single split but rather on the performance of each subsequent split.

The computation of \(\widehat {f_t}(x)\) can seem computationally infeasible, since it involves a sum over all possible subtrees of the current tree, which is exponentially large. Besides, it requires to keep in memory one weight \(e^{-\eta L_{t-1} (\mathcal T)}\) for all the subtrees \(\mathcal T\), which seems exponentially prohibitive as well !

This is precisely where the magics of AMFClassifier resides: it turns out that we can compute exactly and very efficiently \(\widehat {f_t}(x)\) thanks to the prior choice \(\pi\) together with an adaptation of the Context Tree Weighting algorithm, for which more technical details are provided in the paper cited below. The interested reader can find also, in the paper cited below, the construction details of the online tree construction, which is based on the Mondrian process and Mondrian Forests.

Finally, we use \(M\) trees in the forest, all of them follow the same randomized construction. The predictions, for a vector \(x\), of each tree \(m = 1, \ldots, M\), are denoted \(\widehat {f_t}^{(m)}(x)\). The prediction of the forest is simply the average given by

\[\frac 1 M \sum_{m=1}^M \widehat {f_t}^{(m)}(x).\]

The number of trees \(M\) in the forest can be tuned with the n_estimators parameter from AMFClassifier, the default value is 10, but the larger the better (but requires more computations and memory).

Note

When creating a classifier instance, such as a AMFClassifier object, the number n_classes of classes must be provided to the constructor.

Note

All the parameters of AMFClassifier become read-only after the first call to partial_fit

References

@article{mourtada2019amf,
  title={AMF: Aggregated Mondrian Forests for Online Learning},
  author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
  journal={arXiv preprint arXiv:1906.10529},
  year={2019}
}

Regression

Experiments

This page explains how you can reproduce all the experiments from the paper

@article{mourtada2019amf,
  title={AMF: Aggregated Mondrian Forests for Online Learning},
  author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
  journal={arXiv preprint arXiv:1906.10529},
  year={2019}
}

Running the experiments requires the installation of scikit-garden, for a comparison with the Mondrian forests algorithm. This can be done as follows:

git clone https://github.com/scikit-garden/scikit-garden.git && \
    cd scikit-garden && \
    python setup.py build install

in order to get the last version. All the scripts used to produce the figures from the paper are available in the examples folder of the onelearn repository. Clone the repository using

git clone https://github.com/onelearn/onelearn.git

and go to the onelearn folder. Now, running the following scripts allows to reproduce all the figures from the paper :

  • python examples/plot_iterations.py
  • python examples/plot_decisions.py
  • python examples/plot_forest_effect.py
  • python examples/run_regrets_experiments.py
  • python examples/run_online_vs_batch.py
  • python examples/run_n_trees_sensitivity.py

Note that the run_* scripts can take a while to run, in particular run_regrets_experiments.py.

Playgrounds

Two “playgrounds” are proposed in onelearn, in order to help understand the AMFClassifier algorithm. The playgrounds require streamlit, bokeh and matplotlib to run. If you pip installed onelearn, you can simply use

from onelearn import run_playground_decision

run_playground_decision()

to run the decision function playground, and use

from onelearn import run_playground_tree

run_playground_tree()

to run the tree playground. If you git cloned onelearn you can run directly streamlit using

streamlit run examples/playground_decision.py

or

streamlit run examples/playground_tree.py

For the playground_decision playground, the following webpage should automatically open in your web-browser:

_images/playground.png