Follow the Yellow Brick Road

4 min readOct 3, 2020

The easy way to run machine learning visualisations

‘It’s always best to start at the beginning — and all you do is follow the Yellow Brick Road.’ Glinda, the Good Witch — Wizard of Oz.

Yellowbrick is a visualisation and diagnostic tool which allows easy implementation in the machine learning workflow. The library users scikit-learn to run its analysis on the data and matplotlib to create the visuals.

At its core is an API object that analyses the data to create a visualisation. The API object is called a Visualizer. It is a scikit-learn estimator and therefore implements similar methods to create the visualisations. In order to use a visualisation, we would make use of the same workflow as we would with a scikit-learn model. We would firstly import the visualiser, instantiate it, and then use the visualiser’s fit() method on the data. Then in order to render the visualisation, we would call the visualiser’s show() method.

Yellowbrick helps the user to guide them through the model selection process, build an understanding around the feature engineering, algorithm selection and hyperparameter tuning. It can help to identify common problems around the model complexity, as well as issues such as bias, heteroscedasticity, fitting, over/undertraining and class balance issues.

This Python package combines the power of scikit-learn with the capabilities of matplotlib to generate intuitive visualizations of your models.

Model Selection Visualization
Regression Visualization
Classification Visualization
Feature Visualization
Clustering Visualization
Text Visualization

This blog will show how easy it is to conduct machine learning diagnostics and visualisation using this package.

Regression Residual Plots —

This section will look at the ease in which we can draw residual polts using Yellow Brick.

We will be using the mtcars.csv data set for this.

Saving the data set under the dataframe cars.

From here, we will assign the target variable as the mpg column, and will use disp, hp, drat, wt, and qsec as the predictor variables.

From here, we need only 2 lines of code to fit the model, and plot the residuals plot as show below.

We will now look at plotting some classifier visualisations to further illustrate the ease of using this library.

Confusion Matrix —

In order to do this, we will use the iris dataset. The target variable will be the ‘Species’ column, while the predictors will be the columns ‘Sepal Length’ , ‘Sepal Width’ , ‘Petal Length’ and ‘Petal Width.’