¹²² views
License: GPL3
ubuntu2004

Kernel: Python 3 (system-wide)

Introduction to Bokeh

Bokeh is another visualization library that is not built on Matplotlib and is directly exports content to HTML5 and Javascript. Furthermore, Bokeh does offer interactive capabilities that can be ran from a web server or even a framework such as Django.

Let's start by importing in some of the core libraries into our code.

In [1]:

from bokeh.plotting import figure
from bokeh.io import output_notebook, show
output_notebook()

Out[1]:

MIME type unknown not supported

Let's generate some random data so that we can plot it.

In [2]:

from numpy import linspace
x = linspace(-10,10,100)
y = x**3
y1 = x**2
y2 = x

Unlike Matplotlib, we need to instatiate a Bokeh object calling the figure method. From there we can call a multitude of glyphs available to us in Bokeh.

Reference: https://ainfographics.wordpress.com/2017/12/01/python-bokeh-cheat-sheet/

In [3]:

fig = figure(width=500,height=500) 
fig.circle(x,y,size=6,color="red",alpha=0.5) #Here we will call the circle glyph
fig.square(x,y1,size=6,color="green",alpha=0.5)
fig.triangle(x,y2,size=6,color="purple",alpha=0.5)
show(fig)

Out[3]:

MIME type unknown not supported

With the basics out of the way, let's review a demo setup presented out of the Bokeh quickstart. Let's import the autompg data in from the sample data.

https://github.com/bokeh/bokeh/blob/branch-2.3/bokeh/sampledata/_data/auto-mpg.csv

In [4]:

from bokeh.sampledata.autompg import autompg
print(autompg)

Out[4]:

      mpg  cyl  displ   hp  weight  accel  yr  origin  \
  18.0    8  307.0  130    3504   12.0  70       1   
  15.0    8  350.0  165    3693   11.5  70       1   
  18.0    8  318.0  150    3436   11.0  70       1   
  16.0    8  304.0  150    3433   12.0  70       1   
  17.0    8  302.0  140    3449   10.5  70       1   
..    ...  ...    ...  ...     ...    ...  ..     ...   
27.0    4  140.0   86    2790   15.6  82       1   
44.0    4   97.0   52    2130   24.6  82       2   
32.0    4  135.0   84    2295   11.6  82       1   
28.0    4  120.0   79    2625   18.6  82       1   
31.0    4  119.0   82    2720   19.4  82       1   

                          name  
  chevrolet chevelle malibu  
          buick skylark 320  
         plymouth satellite  
              amc rebel sst  
                ford torino  
..                         ...  
          ford mustang gl  
                vw pickup  
            dodge rampage  
              ford ranger  
               chevy s-10  

[392 rows x 9 columns]

Breakdown the AutoMPG data

With the data imported lets group it by yr and perform some aggregation functions on it. It is important to note that the data comes in as a Pandas DataFrame.

In [5]:

grouped = autompg.groupby("yr") #Pandas Dataframe
mpg = grouped.mpg #Pandas Series
avg,std = mpg.mean(),mpg.std()
print("Average mpg: {} / Std Dev mpg: {}".format(avg,std))

Out[5]:

Average mpg: yr
  17.689655
  21.111111
  18.714286
  17.100000
  22.769231
  20.266667
  21.573529
  23.375000
  24.061111
  25.093103
  33.803704
  30.185714
  32.000000
Name: mpg, dtype: float64 / Std Dev mpg: yr
  5.339231
  6.675635
  5.435529
  4.700245
  6.537937
  4.940566
  5.889297
  6.675862
  6.898044
  6.794217
  6.885854
  5.635319
  5.232524
Name: mpg, dtype: float64

Separate the data into groups

With the aggregation completed, lets slice the years from the data and then also split the cars up by the country of origin.

In [6]:

years = list(grouped.groups)
print(years)

american = autompg[autompg["origin"]==1]
japanese = autompg[autompg["origin"]==3]
german = autompg[autompg["origin"]==2]

print(american)
print(japanese)
print(german)

Out[6]:

[70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82]
      mpg  cyl  displ   hp  weight  accel  yr  origin  \
  18.0    8  307.0  130    3504   12.0  70       1   
  15.0    8  350.0  165    3693   11.5  70       1   
  18.0    8  318.0  150    3436   11.0  70       1   
  16.0    8  304.0  150    3433   12.0  70       1   
  17.0    8  302.0  140    3449   10.5  70       1   
..    ...  ...    ...  ...     ...    ...  ..     ...   
27.0    4  151.0   90    2950   17.3  82       1   
27.0    4  140.0   86    2790   15.6  82       1   
32.0    4  135.0   84    2295   11.6  82       1   
28.0    4  120.0   79    2625   18.6  82       1   
31.0    4  119.0   82    2720   19.4  82       1   

                          name  
  chevrolet chevelle malibu  
          buick skylark 320  
         plymouth satellite  
              amc rebel sst  
                ford torino  
..                         ...  
         chevrolet camaro  
          ford mustang gl  
            dodge rampage  
              ford ranger  
               chevy s-10  

[245 rows x 9 columns]
      mpg  cyl  displ  hp  weight  accel  yr  origin                   name
 24.0    4  113.0  95    2372   15.0  70       3  toyota corona mark ii
 27.0    4   97.0  88    2130   14.5  70       3           datsun pl510
 27.0    4   97.0  88    2130   14.5  71       3           datsun pl510
 25.0    4  113.0  95    2228   14.0  71       3          toyota corona
 31.0    4   71.0  65    1773   19.0  71       3    toyota corolla 1200
..    ...  ...    ...  ..     ...    ...  ..     ...                    ...
34.0    4  108.0  70    2245   16.9  82       3         toyota corolla
38.0    4   91.0  67    1965   15.0  82       3            honda civic
32.0    4   91.0  67    1965   15.7  82       3     honda civic (auto)
38.0    4   91.0  67    1995   16.2  82       3          datsun 310 gx
32.0    4  144.0  96    2665   13.9  82       3       toyota celica gt

[79 rows x 9 columns]
      mpg  cyl  displ   hp  weight  accel  yr  origin  \
 26.0    4   97.0   46    1835   20.5  70       2   
 25.0    4  110.0   87    2672   17.5  70       2   
 24.0    4  107.0   90    2430   14.5  70       2   
 25.0    4  104.0   95    2375   17.5  70       2   
 26.0    4  121.0  113    2234   12.5  70       2   
..    ...  ...    ...  ...     ...    ...  ..     ...   
33.0    4  105.0   74    2190   14.2  81       2   
28.1    4  141.0   80    3230   20.4  81       2   
30.7    6  145.0   76    3160   19.6  81       2   
36.0    4  105.0   74    1980   15.3  82       2   
44.0    4   97.0   52    2130   24.6  82       2   

                             name  
 volkswagen 1131 deluxe sedan  
                  peugeot 504  
                  audi 100 ls  
                     saab 99e  
                     bmw 2002  
..                            ...  
            volkswagen jetta  
   peugeot 505s turbo diesel  
                volvo diesel  
         volkswagen rabbit l  
                   vw pickup  

[68 rows x 9 columns]

With the data prepared, let's plot it using additional glyphs available to use in Bokeh. Below you will see the use of a vertical bar (vbar where the top and bottom are define by +/-1 standard deviation from the mean), and three other shaped based glyphs (square, diamond, circle).

In [7]:

fig = figure(title="MPG by Year (US, Germany, Japan)")

fig.vbar(x=years,bottom=avg-std, top=avg+std,width=0.8,fill_alpha=0.2,line_color=None,legend_label="MPG +/- 1 Stddev")

fig.square(x=japanese["yr"],y=japanese["mpg"],size=10,alpha=0.5,color="green",legend_label="Japanese")
fig.diamond(x=american["yr"],y=american["mpg"],size=10,alpha=0.5,color="red",legend_label="American")
fig.circle(x=german["yr"],y=german["mpg"],size=10,alpha=0.5,color="blue",legend_label="German")

show(fig)

Out[7]:

MIME type unknown not supported

Introduce ColumnDataSource

ColumnnDataSource maps data out into a dictionary like format making it easier for Bokeh to process the data. The data within the ColumnDataSource must contain equal number of elements within each property. Bokeh is optimized to consume ColumnDataSource to draw visuals for viewing through web browsers (HTML5 and JS).

In [8]:

from bokeh.models import ColumnDataSource

data = {'x_values': [1, 2, 3, 4, 5],
        'y_values': [6, 7, 2, 3, 6]} #Note that the structure of the data is in a dictionary format with strings type headers

source = ColumnDataSource(data=data) #The dictionary type is coverted into a ColumnDataSource

p = figure() 
p.circle(x='x_values', y='y_values', source=source)
show(p)

Out[8]:

MIME type unknown not supported

Reference: https://docs.bokeh.org/en/latest/docs/user_guide/data.html

With a better understanding of how ColumnDataSource works, we can now easily convert a Pandas Data Frame into a ColumnDataSource by just passing it through the associated method. For gridplots (similar to facets in ggplot and grid plots in Seaborn), you can pass in the data. What makes ColumnDataSource easier to work with in Bokeh is now you can identify the data source with the "source=" attribute and reference the column headers in the method call.

In [10]:

from bokeh.layouts import gridplot #Import gridplot library
print(type(autompg))

source = ColumnDataSource(autompg) #Convert the autompg dataframe into a Column Data Source
print(type(source))

Out[10]:

<class 'pandas.core.frame.DataFrame'>
<class 'bokeh.models.sources.ColumnDataSource'>
ColumnDataSource(id='1503', ...)

Now that we have a sense of the data types, let's move onto using ColumnDataSource in a visualization.

Bokeh Styling Guide Reference: https://docs.bokeh.org/en/latest/docs/user_guide/styling.html

In [13]:

options = dict(plot_width=300, plot_height=300,
               tools="pan,wheel_zoom,box_zoom,box_select,lasso_select")

fig1 = figure(title="MPG by Year", **options)  #Create the first figure
fig2 = figure(title="HP vs. Displacement", **options) #Create the second figure
fig3 = figure(title="MPG vs. Displacement", **options) #Create the third figure

#Single and double asterik variables permit the passing of multiple arguments (*) or keyword=arguments (**)
#Reference: https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters

fig1.circle("yr", "mpg", color="blue", source=source) #Circle plot first figure
fig2.circle("hp", "displ", color="green", source=source) #Circle plot second figure
fig3.circle("mpg", "displ", size="cyl", line_color="red", fill_color=None, source=source) #Modified circle plot third figure

fig = gridplot([[ fig1, fig2, fig3]], toolbar_location="right") #Place figures into the grid and locate the toolbar

show(fig)

Out[13]:

MIME type unknown not supported

Reference: Bokeh Quickstart

Bokeh also provides us a way to filter now the data prior to visiualization. To do that, you use the CDSView and IndexFilter. Calling the CDSView, you can pass an IndexFilter method into the filters attribute. This will limit which indexes a used in the visualization.

In [14]:

from bokeh.models import ColumnDataSource, CDSView, IndexFilter
from bokeh.plotting import figure, output_file, show

source = ColumnDataSource(data = dict(x = list(range(1,11)), y = list(range(2,22,2)))) #Generate data

view = CDSView(source=source, filters = [IndexFilter([0, 2, 4,6])]) #Adjust the view of the data applied on the figure

fig = figure(title = 'Line Plot example', x_axis_label = 'x', y_axis_label = 'y')
fig.circle(x = "x", y = "y", size = 10, source = source, view = view, legend_label = 'filtered') #Apply the view
fig.line(x="x", y="y", source=source, legend_label='unfiltered') #The view is not applied here
show(fig)

Out[14]:

MIME type unknown not supported

Reference: https://www.tutorialspoint.com/bokeh/bokeh_filtering_data.htm

Instead of filtereing by index, you can also leverage a BooleanFilter to identify which data entries are to be plotted.

In [15]:

from bokeh.models import ColumnDataSource, CDSView, BooleanFilter
from bokeh.plotting import figure, show
from bokeh.sampledata.unemployment1948 import data

source = ColumnDataSource(data) #Convert the Unemployment dataframe to a ColumnSourceData 

booleans = [True if int(year) >= 1980 else False for year in source.data['Year']] #Using an if statement list comprehension

print (booleans)

Out[15]:

[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

With the boolean filter configured, we can use that filter in the CDSView to only plot data points of interest -- in this case data that is greater or equal to 1980.

In [0]:

view1 = CDSView(source = source, filters=[BooleanFilter(booleans)]) #Use the BooleanFilter instead of the IndexFilter

p = figure(title = "Unemployment data", x_axis_label = 'Year', y_axis_label='Percentage')# x_range = (1980,2020)
p.circle(x = 'Year', y = 'Annual', source = source, view = view1, color = 'red')
p.line(x = 'Year', y = 'Annual', source = source, color = 'blue')
show(p)

Reference: https://www.tutorialspoint.com/bokeh/bokeh_filtering_data.htm Note: The example in the above link uses a line plot; CDSView does not accept contigous plots for filtering

Finally, below is an example of how you can use Bokeh widgets in your chart. They key with these widgets is to create a callback method that will adjust the visualization along assigning the widget the callback function and value. To access the value from the callback method, use the cb_obj. method along with the target property in this case value. source.change.emit() will update the figure.

In [16]:

from bokeh.layouts import column
from bokeh.models import ColumnDataSource, CustomJS, Slider
from bokeh.plotting import Figure, output_file, show

output_file("js_on_change.html")

x = [x*0.005 for x in range(0, 200)]
y = x

source = ColumnDataSource(data=dict(x=x, y=y))

plot = Figure(plot_width=400, plot_height=400)
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)


callback = CustomJS(args=dict(source=source), code="""
    var data = source.data;    
    var f = cb_obj.value; 
    console.log(cb_data);
    console.log(cb_obj);
    var x = data['x'];
    var y = data['y'];
    for (var i = 0; i < x.length; i++) {
        y[i] = Math.pow(x[i], f);
    }
    source.change.emit();
""")

slider = Slider(start=0.1, end=4, value=1, step=.1, title="power")
slider.js_on_change('value', callback)

layout = column(plot,slider)

show(layout)

Out[16]:

MIME type unknown not supported

In [0]:

Introduction to Bokeh

Introduce ColumnDataSource

Product

Resources

Company