This is onelearn
’s documentation¶
onelearn stands for ONE-shot LEARNning. It is a small python package for online learning with Python. It provides :
- online (or one-shot) learning algorithms: each sample is processed once, only a single pass is performed on the data
- including multi-class classification and regression algorithms
- For now, only ensemble methods, namely Random Forests
Usage¶
onelearn follows the scikit-learn API: you call fit instead of partial_fit each time a new bunch of data is available and use predict_proba or predict whenever you need predictions.
from onelearn import AMFClassifier
amf = AMFClassifier(n_classes=2)
clf.partial_fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:, 1]
Each time you call partial_fit the algorithm updates its decision function using the new data as illustrated in the next figure.
Installation¶
The easiest way to install onelearn is using pip :
pip install onelearn
But you can also use the latest development from github directly with
pip install git+https://github.com/onelearn/onelearn.git
Where to go from here?¶
To know more about onelearn, check out our example gallery or browse through the module reference using the left navigation bar.
Classification¶
For now, onelearn
provides mainly the AMFClassifier
class for multi-class
classification.
Its usage follows the scikit-learn
API, namely a partial_fit
, predict_proba
and predict
methods to respectively fit, predict class probabilities and labels.
The AMFClassifier
with default parameters is created using
from onelearn import AMFClassifier
amf = AMFClassifier(n_classes=2)
where n_classes
must be provided for the construction of the object.
Also, a baseline dummy classifier is provided by OnlineDummyClassifier
, see below.
onelearn.AMFClassifier (n_classes[, …]) |
Aggregated Mondrian Forest classifier for online learning. |
onelearn.OnlineDummyClassifier (n_classes[, …]) |
A dummy online classifier only using past frequencies of the labels. |
About AMFClassifier
¶
The AMFClassifier
class implements the “Aggregated Mondrian Forest” classifier
for online learning (see reference below). This algorithm is truly online, in the sense
that a single pass is performed, and that predictions can be produced anytime.
For multi-class classification with \(C\) classes, we observe, before time \(t\), pairs of features and labels \((x_1, y_1), \ldots, (x_{t-1}, y_{t-1})\) where \(y_s \in \{ 1, \ldots, C \}\) for each \(s = 1, \ldots, t-1\).
Each node in a tree predicts according to the distribution of the labels
it contains. This distribution is regularized using a Dirichlet (a.k.a “Jeffreys”) prior
with parameter \(\alpha > 0\) which corresponds to the dirichlet
parameter in AMFClassifier
.
Each node \(\mathbf v\) of a tree predicts, before time \(t\), the probability of
class \(c\) as
for any \(c = 1, \ldots, C\), where \(n_{{\mathbf v}, t}(c)\) is the number of samples of class \(c\) in node \(\mathbf v\) before time \(t\). This formula is therefore simply a regularized version of the class frequencies.
Each node \(\mathbf v\) in the tree corresponds to a cell denoted \(\mathrm{cell}({\mathbf v})\), which corresponds to a hyper-rectangular subset of the features space. The predictions of a node, before time \(t\), are evaluated by computing its cumulative loss as
which is the sum of the prediction losses of all the samples whose features belong to
\(\mathrm{cell}({\mathbf v})\).
By default, we consider, for multi-class classification, the logarithmic loss
\(\ell (\widehat y, y) = - \log (\widehat y(y))\) for \(y \in \{ 1, \ldots, C \}\).
The loss can be changed using the loss
parameter from AMFClassifier
(however
only loss="log"
is supported for now).
Given a vector of features \(x\) and any subtree \(\mathcal T\) of the current tree, we define \(\mathbf v_{\mathcal T}(x)\) as the leaf of \(\mathcal T\) containing \(x\) (namely \(x\) belongs to its cell). The prediction at time \(t\) of the subtree \(\mathcal T\) for \(x\) is given by
namely the prediction of \(\mathcal T\) is simply the prediction of the leaf of \(\mathcal T\) containing \(x\). We define also the cumulative loss of a subtree \(\mathcal T\) at time \(t\) as
When use_aggregation
is True
(the highly recommended default), the prediction function
of a single tree in AMFClassifier
is given, at step \(t\), by
where the sum is over all subtrees \(\mathcal T\) of the current tree, and where the prior \(\pi\) on subtrees is the probability distribution defined by
where \(|\mathcal T|\) is the number of nodes in \(\mathcal T\) and \(\eta > 0\)
is the learning rate that can be tuned using the step
parameter in AMFClassifier
(theoretically, the default value step=1.0
is the best, and usually performs just fine).
Note that \(\pi\) is the distribution of the branching process with branching probability \(1 / 2\) at each node of the complete binary tree, with exactly two children when it branches. This aggregation procedure is a non-greedy way to prune trees: the weights do not depend only on the quality of one single split but rather on the performance of each subsequent split.
The computation of \(\widehat {f_t}(x)\) can seem computationally infeasible, since it involves a sum over all possible subtrees of the current tree, which is exponentially large. Besides, it requires to keep in memory one weight \(e^{-\eta L_{t-1} (\mathcal T)}\) for all the subtrees \(\mathcal T\), which seems exponentially prohibitive as well !
This is precisely where the magics of AMFClassifier
resides: it turns out that
we can compute exactly and very efficiently \(\widehat {f_t}(x)\) thanks to the
prior choice \(\pi\) together with an adaptation of the Context Tree Weighting algorithm,
for which more technical details are provided in the paper cited below.
The interested reader can find also, in the paper cited below, the construction details of
the online tree construction, which is based on the Mondrian process and Mondrian Forests.
Finally, we use \(M\) trees in the forest, all of them follow the same randomized construction. The predictions, for a vector \(x\), of each tree \(m = 1, \ldots, M\), are denoted \(\widehat {f_t}^{(m)}(x)\). The prediction of the forest is simply the average given by
The number of trees \(M\) in the forest can be tuned with the n_estimators
parameter
from AMFClassifier
, the default value is 10, but the larger the better
(but requires more computations and memory).
Note
When creating a classifier instance, such as a AMFClassifier
object, the
number n_classes
of classes must be provided to the constructor.
Note
All the parameters of AMFClassifier
become read-only after the first call
to partial_fit
References¶
@article{mourtada2019amf,
title={AMF: Aggregated Mondrian Forests for Online Learning},
author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
journal={arXiv preprint arXiv:1906.10529},
year={2019}
}
Regression¶
Experiments¶
This page explains how you can reproduce all the experiments from the paper
@article{mourtada2019amf,
title={AMF: Aggregated Mondrian Forests for Online Learning},
author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
journal={arXiv preprint arXiv:1906.10529},
year={2019}
}
Running the experiments requires the installation of scikit-garden
, for a comparison
with the Mondrian forests algorithm. This can be done as follows:
git clone https://github.com/scikit-garden/scikit-garden.git && \
cd scikit-garden && \
python setup.py build install
in order to get the last version. All the scripts used to produce the figures from the paper
are available in the examples
folder of the onelearn
repository.
Clone the repository using
git clone https://github.com/onelearn/onelearn.git
and go to the onelearn
folder. Now, running the following scripts allows to reproduce all the
figures from the paper :
python examples/plot_iterations.py
python examples/plot_decisions.py
python examples/plot_forest_effect.py
python examples/run_regrets_experiments.py
python examples/run_online_vs_batch.py
python examples/run_n_trees_sensitivity.py
Note that the run_*
scripts can take a while to run, in particular run_regrets_experiments.py
.
Playgrounds¶
Two “playgrounds” are proposed in onelearn
, in order to help understand the AMFClassifier
algorithm. The playgrounds require streamlit
, bokeh
and matplotlib
to run.
If you pip installed onelearn
, you can simply use
from onelearn import run_playground_decision
run_playground_decision()
to run the decision function playground, and use
from onelearn import run_playground_tree
run_playground_tree()
to run the tree playground. If you git cloned onelearn
you can run directly streamlit
using
streamlit run examples/playground_decision.py
or
streamlit run examples/playground_tree.py
For the playground_decision
playground, the following webpage should automatically open in your web-browser:

Gallery of examples¶
You can find below a bunch of examples illustrating onelearn
.
More interactive examples can be found in the Playgrounds page.
Note
Click here to download the full example code
Illustration of the forest effect¶
In this example we show that the decision function of a forest is the average of independent trees, and that averaging allows to produce smooth decision functions.

Out:
2020-04-30 21:08:19 Building the graph...
2020-04-30 21:08:28 Saved the forest effect plot in forest_effect.pdf
import sys
import warnings
warnings.filterwarnings("ignore")
import logging
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
sys.path.extend([".", ".."])
from onelearn import AMFClassifier
from experiments.plot import (
plot_contour_binary_classif,
plot_scatter_binary_classif,
get_mesh,
)
logging.basicConfig(
level=logging.INFO, format="%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
)
norm = plt.Normalize(vmin=0.0, vmax=1.0)
levels = 30
def plot_forest_effect(forest, dataset):
n_estimators = forest.n_estimators
_ = plt.figure(figsize=(2 * (n_estimators / 2 + 1), 4))
X, y = dataset
xx, yy, X_mesh = get_mesh(X)
# Plot the training points
ax = plt.subplot(2, n_estimators / 2 + 1, 1)
plot_scatter_binary_classif(ax, xx, yy, X, y, title="Input data")
forest.partial_fit(X, y)
for idx_tree in range(n_estimators):
ax = plt.subplot(2, n_estimators / 2 + 1, idx_tree + 2)
Z = forest.predict_proba_tree(X_mesh, idx_tree)[:, 1].reshape(xx.shape)
plot_contour_binary_classif(
ax, xx, yy, Z, title="Tree #%d" % (idx_tree + 1), norm=norm, levels=levels
)
ax = plt.subplot(2, n_estimators / 2 + 1, n_estimators + 2)
Z = forest.predict_proba(X_mesh)[:, 1].reshape(xx.shape)
plot_contour_binary_classif(ax, xx, yy, Z, title="Forest", norm=norm, levels=levels)
plt.tight_layout()
n_samples = 100
n_features = 2
n_classes = 2
random_state = 42
dataset = make_moons(n_samples=n_samples, noise=0.15, random_state=random_state)
n_estimators = 10
amf = AMFClassifier(
n_classes=n_classes,
n_estimators=n_estimators,
random_state=random_state,
use_aggregation=True,
split_pure=True,
)
logging.info("Building the graph...")
plot_forest_effect(amf, dataset)
plt.savefig("forest_effect.pdf")
logging.info("Saved the forest effect plot in forest_effect.pdf")
Total running time of the script: ( 0 minutes 9.546 seconds)
Note
Click here to download the full example code
Logo example¶
This is a small example that produces the logo of the onelearn
library.

Out:
Plotting iterations: 0%| | 0/4 [00:00<?, ?it/s]
Plotting iterations: 25%|##5 | 1/4 [00:02<00:06, 2.04s/it]
Plotting iterations: 50%|##### | 2/4 [00:04<00:04, 2.11s/it]
Plotting iterations: 75%|#######5 | 3/4 [00:07<00:02, 2.31s/it]
Plotting iterations: 100%|##########| 4/4 [00:09<00:00, 2.42s/it]
Plotting iterations: 100%|##########| 4/4 [00:09<00:00, 2.44s/it]
import sys
import logging
import matplotlib.pyplot as plt
from tqdm import trange
from sklearn.datasets import make_moons
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
sys.path.extend([".", ".."])
from onelearn import AMFClassifier
from experiments.plot import (
get_mesh,
plot_contour_binary_classif,
plot_scatter_binary_classif,
)
logging.basicConfig(
format="%(asctime)s %(message)s", datefmt="%Y/%m/%d %H:%M:%S", level=logging.INFO
)
n_samples = 100
n_features = 2
n_classes = 2
random_state = 123
save_iterations = [5, 10, 30, 70]
logging.info("Simulation of the data")
X, y = make_moons(n_samples=n_samples, noise=0.25, random_state=random_state)
logging.info("Train/Test splitting")
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=random_state
)
logging.info("Computation of the meshgrid")
xx, yy, X_mesh = get_mesh(X)
clf = AMFClassifier(
n_classes=n_classes,
n_estimators=100,
random_state=random_state,
split_pure=True,
use_aggregation=True,
)
n_plots = len(save_iterations)
n_fig = 0
save_iterations = [0, *save_iterations]
plt.figure(figsize=(3, 3))
logging.info("Launching iterations")
bar = trange(n_plots, desc="Plotting iterations", leave=True)
norm = plt.Normalize(vmin=0.0, vmax=1.0)
for start, end in zip(save_iterations[:-1], save_iterations[1:]):
X_iter = X_train[start:end]
y_iter = y_train[start:end]
clf.partial_fit(X_iter, y_iter)
n_fig += 1
Z = clf.predict_proba(X_mesh)[:, 1].reshape(xx.shape)
ax = plt.subplot(2, 2, n_fig)
score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
plot_contour_binary_classif(ax, xx, yy, Z, levels=5, norm=norm)
plot_scatter_binary_classif(
ax, xx, yy, X_train[:end], y_train[:end], s=15, norm=norm
)
bar.update(1)
bar.close()
plt.subplots_adjust(wspace=0, hspace=0)
plt.savefig("logo.png", transparent=True)
logging.info("Saved logo in file logo.png")
Total running time of the script: ( 0 minutes 9.866 seconds)
Note
Click here to download the full example code
Plot iterations of AMFClassifier¶
In this examples we illustrate the evolution of the decision function produced by
AMFClassifier
along iterations (repeated calls to partial_fit
).

Out:
Plotting iterations: 0%| | 0/6 [00:00<?, ?it/s]
Plotting iterations: 17%|#6 | 1/6 [00:01<00:09, 1.83s/it]
Plotting iterations: 33%|###3 | 2/6 [00:03<00:07, 1.90s/it]
Plotting iterations: 50%|##### | 3/6 [00:06<00:06, 2.02s/it]
Plotting iterations: 67%|######6 | 4/6 [00:08<00:04, 2.22s/it]
Plotting iterations: 83%|########3 | 5/6 [00:11<00:02, 2.46s/it]
Plotting iterations: 100%|##########| 6/6 [00:15<00:00, 2.72s/it]
Plotting iterations: 100%|##########| 6/6 [00:15<00:00, 2.54s/it]
import sys
import logging
import matplotlib.pyplot as plt
from tqdm import trange
from sklearn.datasets import make_moons
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
sys.path.extend([".", ".."])
from onelearn import AMFClassifier
from experiments.plot import (
get_mesh,
plot_contour_binary_classif,
plot_scatter_binary_classif,
)
logging.basicConfig(
format="%(asctime)s %(message)s", datefmt="%Y/%m/%d %H:%M:%S", level=logging.INFO
)
norm = plt.Normalize(vmin=0.0, vmax=1.0)
n_samples = 400
n_features = 2
n_classes = 2
seed = 123
random_state = 42
levels = 30
save_iterations = [5, 10, 20, 50, 100, 200]
output_filename = "iterations.pdf"
logging.info("Simulation of the data")
X, y = make_moons(n_samples=n_samples, noise=0.2, random_state=random_state)
logging.info("Train/Test splitting")
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=random_state
)
logging.info("Computation of the meshgrid")
xx, yy, X_mesh = get_mesh(X)
clf = AMFClassifier(
n_classes=n_classes,
n_estimators=100,
random_state=random_state,
split_pure=True,
use_aggregation=True,
)
n_plots = len(save_iterations)
n_fig = 0
save_iterations = [0, *save_iterations]
fig, axes = plt.subplots(nrows=2, ncols=n_plots, figsize=(3 * n_plots, 6))
logging.info("Launching iterations")
bar = trange(n_plots, desc="Plotting iterations", leave=True)
for start, end in zip(save_iterations[:-1], save_iterations[1:]):
X_iter = X_train[start:end]
y_iter = y_train[start:end]
clf.partial_fit(X_iter, y_iter)
ax = axes[0, n_fig]
plot_scatter_binary_classif(
ax,
xx,
yy,
X_train[:end],
y_train[:end],
s=50,
title="t = %d" % end,
fontsize=20,
noaxes=False,
)
Z = clf.predict_proba(X_mesh)[:, 1].reshape(xx.shape)
score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
ax = axes[1, n_fig]
plot_contour_binary_classif(ax, xx, yy, Z, score=score, levels=levels, norm=norm)
n_fig += 1
bar.update(1)
bar.close()
plt.tight_layout()
plt.savefig(output_filename)
logging.info("Saved result in file %s" % output_filename)
Total running time of the script: ( 0 minutes 16.269 seconds)
Note
Click here to download the full example code
Weighted depths of AMFRegressor on several 1D signals.¶
The example below illustrates the weighted depth learned internally by the AMF algorithm to estimate 1D regression functions. We observe that AMF automatically adapts to the local regularity of signals, by putting more emphasis on deeper trees where the regression function is not unsmooth.
import sys
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
import logging
sys.path.extend([".", ".."])
from onelearn import AMFRegressor
from onelearn.datasets import get_signal, make_regression
logging.basicConfig(
level=logging.INFO, format="%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
)
colormap = get_cmap("tab20")
n_samples_train = 5000
n_samples_test = 1000
random_state = 42
noise = 0.03
use_aggregation = True
split_pure = True
n_estimators = 100
step = 10.0
signals = ["heavisine", "bumps", "blocks", "doppler"]
def plot_weighted_depth(signal):
X_train, y_train = make_regression(
n_samples=n_samples_train, signal=signal, noise=noise, random_state=random_state
)
X_test = np.linspace(0, 1, num=n_samples_test)
amf = AMFRegressor(
random_state=random_state,
use_aggregation=use_aggregation,
n_estimators=n_estimators,
split_pure=split_pure,
step=step,
)
amf.partial_fit(X_train.reshape(n_samples_train, 1), y_train)
y_pred = amf.predict(X_test.reshape(n_samples_test, 1))
weighted_depths = amf.weighted_depth(X_test.reshape(n_samples_test, 1))
fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, sharex=True, figsize=(6, 5))
plot_samples = ax1.plot(
X_train, y_train, color=colormap.colors[1], lw=2, label="Samples"
)[0]
plot_signal = ax1.plot(
X_test,
get_signal(X_test, signal),
lw=2,
color=colormap.colors[0],
label="Signal",
)[0]
plot_prediction = ax2.plot(
X_test.ravel(), y_pred, lw=2, color=colormap.colors[2], label="Prediction"
)[0]
ax3.plot(
X_test,
weighted_depths[:, 1:],
lw=1,
color=colormap.colors[5],
alpha=0.2,
label="Weighted depths",
)
plot_weighted_depths = ax3.plot(
X_test, weighted_depths[:, 0], lw=1, color=colormap.colors[5], alpha=0.2
)[0]
plot_mean_weighted_depths = ax3.plot(
X_test,
weighted_depths.mean(axis=1),
lw=2,
color=colormap.colors[4],
label="Mean weighted depth",
)[0]
filename = "weighted_depths_%s.pdf" % signal
fig.subplots_adjust(hspace=0.1)
fig.legend(
(
plot_signal,
plot_samples,
plot_mean_weighted_depths,
plot_weighted_depths,
plot_prediction,
),
(
"Signal",
"Samples",
"Average weighted depths",
"Weighted depths",
"Prediction",
),
fontsize=12,
loc="upper center",
bbox_to_anchor=(0.5, 1.0),
ncol=3,
)
plt.savefig(filename)
logging.info("Saved the decision functions in '%s'" % filename)
for signal in signals:
plot_weighted_depth(signal)
Total running time of the script: ( 0 minutes 20.976 seconds)
Note
Click here to download the full example code
Comparisons of decision functions¶
This example allows to compare the decision functions of several random forest types of estimators. The following classifiers are used:
- AMF stands for AMFClassifier from onelearn
- MF stands for MondrianForestClassifier from scikit-garden
- RF stands for RandomForestClassifier from scikit-learn
- ET stands for ExtraTreesClassifier from scikit-learn

import sys
import numpy as np
import matplotlib.pyplot as plt
import logging
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.datasets import make_moons, make_classification, make_circles
from sklearn.model_selection import train_test_split
from skgarden import MondrianForestClassifier
sys.path.extend([".", ".."])
from onelearn import AMFClassifier
from experiments import (
get_mesh,
plot_contour_binary_classif,
plot_scatter_binary_classif,
)
logging.basicConfig(
level=logging.INFO, format="%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
)
np.set_printoptions(precision=2)
n_samples = 1000
random_state = 42
h = 0.01
levels = 20
use_aggregation = True
split_pure = True
n_estimators = 100
step = 1.0
dirichlet = 0.5
norm = plt.Normalize(vmin=0.0, vmax=1.0)
def simulate_data(dataset="moons"):
if dataset == "moons":
X, y = make_moons(n_samples=n_samples, noise=0.2, random_state=random_state)
elif dataset == "circles":
X, y = make_circles(
n_samples=n_samples, noise=0.1, factor=0.5, random_state=random_state
)
elif dataset == "linear":
X, y = make_classification(
n_samples=n_samples,
n_features=2,
n_redundant=0,
n_informative=2,
random_state=random_state,
n_clusters_per_class=1,
flip_y=0.001,
class_sep=2.0,
)
rng = np.random.RandomState(random_state)
X += 2 * rng.uniform(size=X.shape)
else:
X, y = make_moons(n_samples=n_samples, noise=0.2, random_state=random_state)
X = MinMaxScaler().fit_transform(X)
return X, y
datasets = [simulate_data("moons"), simulate_data("circles"), simulate_data("linear")]
n_classifiers = 5
n_datasets = 3
_ = plt.figure(figsize=(2 * (n_classifiers + 1), 2 * n_datasets))
def get_classifiers():
return [
(
"AMF",
AMFClassifier(
n_classes=2,
n_estimators=n_estimators,
random_state=random_state,
use_aggregation=True,
split_pure=True,
),
),
(
"AMF(no agg)",
AMFClassifier(
n_classes=2,
n_estimators=n_estimators,
random_state=random_state,
use_aggregation=False,
split_pure=True,
),
),
(
"MF",
MondrianForestClassifier(
n_estimators=n_estimators, random_state=random_state
),
),
(
"RF",
RandomForestClassifier(
n_estimators=n_estimators, random_state=random_state
),
),
(
"ET",
ExtraTreesClassifier(n_estimators=n_estimators, random_state=random_state),
),
]
i = 1
for ds_cnt, ds in enumerate(datasets):
X, y = ds
xx, yy, X_mesh = get_mesh(X, h=h, padding=0.2)
ax = plt.subplot(n_datasets, n_classifiers + 1, i)
if ds_cnt == 0:
title = "Input data"
else:
title = None
plot_scatter_binary_classif(ax, xx, yy, X, y, s=10, title=title)
i += 1
classifiers = get_classifiers()
for name, clf in classifiers:
ax = plt.subplot(n_datasets, n_classifiers + 1, i)
if hasattr(clf, "clear"):
clf.clear()
if hasattr(clf, "partial_fit"):
clf.partial_fit(X, y)
else:
clf.fit(X, y)
Z = clf.predict_proba(X_mesh)[:, 1].reshape(xx.shape)
if ds_cnt == 0:
plot_contour_binary_classif(
ax, xx, yy, Z, levels=levels, title=name, norm=norm
)
else:
plot_contour_binary_classif(ax, xx, yy, Z, levels=levels, norm=norm)
i += 1
plt.tight_layout()
plt.savefig("decisions.pdf")
logging.info("Saved the decision functions in 'decision.pdf")
Total running time of the script: ( 0 minutes 19.482 seconds)