# How to quickly find the best bins for your histogram

Wednesday, February 6, 2019

Data exploration is a critical step in every data science project and it usually starts with looking at the distribution of single variables. This is where histograms shine.

Histograms are great for visualising the distribution of columns, which helps to understand important aspects of the data. By simply looking at a histogram, we can for example immediately identify outliers or even errors in our data (e.g. negative values in a column containing the age of patients).

When working with histograms, we almost always end up adjusting the bin width, which is a critical parameter as it determines how much and what kind of information we can extract from the plot.

In this article, I will show you how you can quickly find your optimal bin width by creating an interactive histogram that you can rebin on the fly using plotly and ipywidgets in Jupyter Notebook or JupyterLab.

Even though I show interactive rebinning with plotly, you can apply the logic I’m illustrating to any plotting library, such as seaborn and matplotlib.

For the visualization, I will display the air time in minutes of more than 300,000 flights that departed NYC in 2013 (NYCflights13 data). You can find the full code for this article as a Jupyter Notebook on GitHub.

# Histogram with interactive binning In this graphic you can see the end result. If we change the bin width through a slider, the plotly graph adjusts automatically.

In order to implement this behavior, we combine plotly.graph_objs (creates the plotly graph) with an `ipywidgets.Floatslider`.

This is the code for creating the rebinnable histogram.

``````
import plotly.graph_objs as go
import ipywidgets as widgets

def rebinnable_interactive_histogram(series, initial_bin_width=10):
figure_widget = go.FigureWidget(
data=[go.Histogram(x=series, xbins={"size": initial_bin_width})]
)

bin_slider = widgets.FloatSlider(
value=initial_bin_width,
min=1,
max=30,
step=1,
description="Bin width:",
)

histogram_object = figure_widget.data

def set_bin_size(change):
histogram_object.xbins = {"size": change["new"]}

bin_slider.observe(set_bin_size, names="value")

output_widget = widgets.VBox([figure_widget, bin_slider])
return output_widget

rebinnable_interactive_histogram(df, "air_time")
``````

Let’s go through it line by line.

# Explaining the code line by line

## 0. Function signature

``````def rebinnable_interactive_histogram(series, initial_bin_width=10):
``````

Note that our function takes two arguments: series a pandas.Series, and initial_bin_width, specifying the bin width we want to have a as a default in our plot. In our case, its a 10-minutes air time window.

## 1. Creating the figure

``````    figure_widget = go.FigureWidget(
data=[go.Histogram(x=series, xbins={"size": initial_bin_width})]
)
``````

We generate a new FigureWidget instance. The FigureWidget object is the new “magic object” of plotly. You can display it within Jupyter Notebook or JupyterLab like any normal plotly figure. However, this approach has some advantages:

• FigureWidgets can be combined with ipywidgets in order to create more powerful constructs (in fact, that’s what FigureWidgets are designed for)
• you can manipulate the `FigureWidget` in various ways from Python
• you can also listen for some events and
• when an event is triggered, you can execute more Python code

The `FigureWidget` receives the attribute `data`, which specifies a list of all the traces (read: visualizations) that we want to show. In our case, we only want to show a single histogram. The x values for the histogram are coming from the `series`. We set the bin width by passing a dictionary to `xbins`. When we set `size=None` in the dictionary, plotly will choose a bin width for us.

## 2. Creating the slider

``````    bin_slider = widgets.FloatSlider(
value=initial_bin_width,
min=1,
max=30,
step=1,
description="Bin width:",
)
``````

We generate a `FloatSlider` using the `ipywidgets` library. Via this slider, we will later be able to manipulate our histogram.

## 3. Saving a reference to the histogram

``````    histogram_object = figure_widget.data
``````

We get the reference to the histogram because we want to manipulate it in the last step. In particular, we will change the `xbins` attribute of our object, which we can access via `histogram_object.xbins`.

## 4. Write and use the callback

``````    def set_bin_size(change):
histogram_object.xbins = {"size": change["new"]}

bin_slider.observe(set_bin_size, names="value")
``````

The `FloatSlider` we have implemented comes with some magic. Every time its value changes (i.e. we move the slider), it triggers an event. We can use that event to update the bin width in our histogram. Technically, you do that by calling the `observe` method on the bin slider, pass it the function you want to call ( `set_bin_size` in our case) and tell it when to call the function (`name="value"` meaning that we call the function whenever the `value` of the slider changes). Now, whenever the slider’s value changes, it will call `set_bin_size`. `set_bin_size` has access to the slider’s value through the magic argument change — a dictionary containing data about the event triggered by bin_slider. For example, `change["new"]` contains the new value of the slider, but you can also access its previous value with `change["old"]`. Note that you don’t have to use the argument name `change`. You can give it any name you want.

Inside the callback function `set_bin_size`, we can see that it simply takes the reference histogram_object in order to update the FigureWidget‘s bin settings (i.e. change the bin width) by overwriting xbins.

When we put all the pieces from above together, we have our first prototype for a nice interactive histogram.

# Conclusion

Histograms are a great way to get started exploring single columns of a data set. With plotly, we can create powerful interactive visualizations which can further be enhanced with ipywidgets.

In this article, I have shown you how you can interactively and quickly find the (subjectively) optimal bin width for a histogram when working in Jupyter Notebook or JupyterLab using plotly and ipywidgets.

At 8080 Labs, we use the rebinning feature in our python tool bamboolib. Together with many other interactive features, it helps our users get insights faster.