Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
108 views
License: GPL3
ubuntu2004
Kernel: Python 3 (system-wide)

Introduction to Bokeh

Bokeh is another visualization library that is not built on Matplotlib and is directly exports content to HTML5 and Javascript. Furthermore, Bokeh does offer interactive capabilities that can be ran from a web server or even a framework such as Django.

Let's start by importing in some of the core libraries into our code.

from bokeh.plotting import figure from bokeh.io import output_notebook, show output_notebook()
MIME type unknown not supported

Let's generate some random data so that we can plot it.

from numpy import linspace x = linspace(-10,10,100) y = x**3 y1 = x**2 y2 = x

Unlike Matplotlib, we need to instatiate a Bokeh object calling the figure method. From there we can call a multitude of glyphs available to us in Bokeh.

Reference: https://ainfographics.wordpress.com/2017/12/01/python-bokeh-cheat-sheet/

fig = figure(width=500,height=500) fig.circle(x,y,size=6,color="red",alpha=0.5) #Here we will call the circle glyph fig.square(x,y1,size=6,color="green",alpha=0.5) fig.triangle(x,y2,size=6,color="purple",alpha=0.5) show(fig)
MIME type unknown not supported

With the basics out of the way, let's review a demo setup presented out of the Bokeh quickstart. Let's import the autompg data in from the sample data.

https://github.com/bokeh/bokeh/blob/branch-2.3/bokeh/sampledata/_data/auto-mpg.csv

from bokeh.sampledata.autompg import autompg print(autompg)
mpg cyl displ hp weight accel yr origin \ 0 18.0 8 307.0 130 3504 12.0 70 1 1 15.0 8 350.0 165 3693 11.5 70 1 2 18.0 8 318.0 150 3436 11.0 70 1 3 16.0 8 304.0 150 3433 12.0 70 1 4 17.0 8 302.0 140 3449 10.5 70 1 .. ... ... ... ... ... ... .. ... 387 27.0 4 140.0 86 2790 15.6 82 1 388 44.0 4 97.0 52 2130 24.6 82 2 389 32.0 4 135.0 84 2295 11.6 82 1 390 28.0 4 120.0 79 2625 18.6 82 1 391 31.0 4 119.0 82 2720 19.4 82 1 name 0 chevrolet chevelle malibu 1 buick skylark 320 2 plymouth satellite 3 amc rebel sst 4 ford torino .. ... 387 ford mustang gl 388 vw pickup 389 dodge rampage 390 ford ranger 391 chevy s-10 [392 rows x 9 columns]

Breakdown the AutoMPG data

With the data imported lets group it by yr and perform some aggregation functions on it. It is important to note that the data comes in as a Pandas DataFrame.

grouped = autompg.groupby("yr") #Pandas Dataframe mpg = grouped.mpg #Pandas Series avg,std = mpg.mean(),mpg.std() print("Average mpg: {} / Std Dev mpg: {}".format(avg,std))
Average mpg: yr 70 17.689655 71 21.111111 72 18.714286 73 17.100000 74 22.769231 75 20.266667 76 21.573529 77 23.375000 78 24.061111 79 25.093103 80 33.803704 81 30.185714 82 32.000000 Name: mpg, dtype: float64 / Std Dev mpg: yr 70 5.339231 71 6.675635 72 5.435529 73 4.700245 74 6.537937 75 4.940566 76 5.889297 77 6.675862 78 6.898044 79 6.794217 80 6.885854 81 5.635319 82 5.232524 Name: mpg, dtype: float64

Separate the data into groups

With the aggregation completed, lets slice the years from the data and then also split the cars up by the country of origin.

years = list(grouped.groups) print(years) american = autompg[autompg["origin"]==1] japanese = autompg[autompg["origin"]==3] german = autompg[autompg["origin"]==2] print(american) print(japanese) print(german)
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82] mpg cyl displ hp weight accel yr origin \ 0 18.0 8 307.0 130 3504 12.0 70 1 1 15.0 8 350.0 165 3693 11.5 70 1 2 18.0 8 318.0 150 3436 11.0 70 1 3 16.0 8 304.0 150 3433 12.0 70 1 4 17.0 8 302.0 140 3449 10.5 70 1 .. ... ... ... ... ... ... .. ... 386 27.0 4 151.0 90 2950 17.3 82 1 387 27.0 4 140.0 86 2790 15.6 82 1 389 32.0 4 135.0 84 2295 11.6 82 1 390 28.0 4 120.0 79 2625 18.6 82 1 391 31.0 4 119.0 82 2720 19.4 82 1 name 0 chevrolet chevelle malibu 1 buick skylark 320 2 plymouth satellite 3 amc rebel sst 4 ford torino .. ... 386 chevrolet camaro 387 ford mustang gl 389 dodge rampage 390 ford ranger 391 chevy s-10 [245 rows x 9 columns] mpg cyl displ hp weight accel yr origin name 14 24.0 4 113.0 95 2372 15.0 70 3 toyota corona mark ii 18 27.0 4 97.0 88 2130 14.5 70 3 datsun pl510 29 27.0 4 97.0 88 2130 14.5 71 3 datsun pl510 31 25.0 4 113.0 95 2228 14.0 71 3 toyota corona 52 31.0 4 71.0 65 1773 19.0 71 3 toyota corolla 1200 .. ... ... ... .. ... ... .. ... ... 376 34.0 4 108.0 70 2245 16.9 82 3 toyota corolla 377 38.0 4 91.0 67 1965 15.0 82 3 honda civic 378 32.0 4 91.0 67 1965 15.7 82 3 honda civic (auto) 379 38.0 4 91.0 67 1995 16.2 82 3 datsun 310 gx 384 32.0 4 144.0 96 2665 13.9 82 3 toyota celica gt [79 rows x 9 columns] mpg cyl displ hp weight accel yr origin \ 19 26.0 4 97.0 46 1835 20.5 70 2 20 25.0 4 110.0 87 2672 17.5 70 2 21 24.0 4 107.0 90 2430 14.5 70 2 22 25.0 4 104.0 95 2375 17.5 70 2 23 26.0 4 121.0 113 2234 12.5 70 2 .. ... ... ... ... ... ... .. ... 349 33.0 4 105.0 74 2190 14.2 81 2 354 28.1 4 141.0 80 3230 20.4 81 2 355 30.7 6 145.0 76 3160 19.6 81 2 369 36.0 4 105.0 74 1980 15.3 82 2 388 44.0 4 97.0 52 2130 24.6 82 2 name 19 volkswagen 1131 deluxe sedan 20 peugeot 504 21 audi 100 ls 22 saab 99e 23 bmw 2002 .. ... 349 volkswagen jetta 354 peugeot 505s turbo diesel 355 volvo diesel 369 volkswagen rabbit l 388 vw pickup [68 rows x 9 columns]

With the data prepared, let's plot it using additional glyphs available to use in Bokeh. Below you will see the use of a vertical bar (vbar where the top and bottom are define by +/-1 standard deviation from the mean), and three other shaped based glyphs (square, diamond, circle).

fig = figure(title="MPG by Year (US, Germany, Japan)") fig.vbar(x=years,bottom=avg-std, top=avg+std,width=0.8,fill_alpha=0.2,line_color=None,legend_label="MPG +/- 1 Stddev") fig.square(x=japanese["yr"],y=japanese["mpg"],size=10,alpha=0.5,color="green",legend_label="Japanese") fig.diamond(x=american["yr"],y=american["mpg"],size=10,alpha=0.5,color="red",legend_label="American") fig.circle(x=german["yr"],y=german["mpg"],size=10,alpha=0.5,color="blue",legend_label="German") show(fig)
MIME type unknown not supported

Introduce ColumnDataSource

ColumnnDataSource maps data out into a dictionary like format making it easier for Bokeh to process the data. The data within the ColumnDataSource must contain equal number of elements within each property. Bokeh is optimized to consume ColumnDataSource to draw visuals for viewing through web browsers (HTML5 and JS).

from bokeh.models import ColumnDataSource data = {'x_values': [1, 2, 3, 4, 5], 'y_values': [6, 7, 2, 3, 6]} #Note that the structure of the data is in a dictionary format with strings type headers source = ColumnDataSource(data=data) #The dictionary type is coverted into a ColumnDataSource p = figure() p.circle(x='x_values', y='y_values', source=source) show(p)
MIME type unknown not supported

Reference: https://docs.bokeh.org/en/latest/docs/user_guide/data.html

With a better understanding of how ColumnDataSource works, we can now easily convert a Pandas Data Frame into a ColumnDataSource by just passing it through the associated method. For gridplots (similar to facets in ggplot and grid plots in Seaborn), you can pass in the data. What makes ColumnDataSource easier to work with in Bokeh is now you can identify the data source with the "source=" attribute and reference the column headers in the method call.

from bokeh.layouts import gridplot #Import gridplot library print(type(autompg)) source = ColumnDataSource(autompg) #Convert the autompg dataframe into a Column Data Source print(type(source))
<class 'pandas.core.frame.DataFrame'> <class 'bokeh.models.sources.ColumnDataSource'> ColumnDataSource(id='1503', ...)

Now that we have a sense of the data types, let's move onto using ColumnDataSource in a visualization.

options = dict(plot_width=300, plot_height=300, tools="pan,wheel_zoom,box_zoom,box_select,lasso_select") fig1 = figure(title="MPG by Year", **options) #Create the first figure fig2 = figure(title="HP vs. Displacement", **options) #Create the second figure fig3 = figure(title="MPG vs. Displacement", **options) #Create the third figure #Single and double asterik variables permit the passing of multiple arguments (*) or keyword=arguments (**) #Reference: https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters fig1.circle("yr", "mpg", color="blue", source=source) #Circle plot first figure fig2.circle("hp", "displ", color="green", source=source) #Circle plot second figure fig3.circle("mpg", "displ", size="cyl", line_color="red", fill_color=None, source=source) #Modified circle plot third figure fig = gridplot([[ fig1, fig2, fig3]], toolbar_location="right") #Place figures into the grid and locate the toolbar show(fig)
MIME type unknown not supported

Reference: Bokeh Quickstart

Bokeh also provides us a way to filter now the data prior to visiualization. To do that, you use the CDSView and IndexFilter. Calling the CDSView, you can pass an IndexFilter method into the filters attribute. This will limit which indexes a used in the visualization.

from bokeh.models import ColumnDataSource, CDSView, IndexFilter from bokeh.plotting import figure, output_file, show source = ColumnDataSource(data = dict(x = list(range(1,11)), y = list(range(2,22,2)))) #Generate data view = CDSView(source=source, filters = [IndexFilter([0, 2, 4,6])]) #Adjust the view of the data applied on the figure fig = figure(title = 'Line Plot example', x_axis_label = 'x', y_axis_label = 'y') fig.circle(x = "x", y = "y", size = 10, source = source, view = view, legend_label = 'filtered') #Apply the view fig.line(x="x", y="y", source=source, legend_label='unfiltered') #The view is not applied here show(fig)
MIME type unknown not supported

Reference: https://www.tutorialspoint.com/bokeh/bokeh_filtering_data.htm

Instead of filtereing by index, you can also leverage a BooleanFilter to identify which data entries are to be plotted.

from bokeh.models import ColumnDataSource, CDSView, BooleanFilter from bokeh.plotting import figure, show from bokeh.sampledata.unemployment1948 import data source = ColumnDataSource(data) #Convert the Unemployment dataframe to a ColumnSourceData booleans = [True if int(year) >= 1980 else False for year in source.data['Year']] #Using an if statement list comprehension print (booleans)
[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

With the boolean filter configured, we can use that filter in the CDSView to only plot data points of interest -- in this case data that is greater or equal to 1980.

view1 = CDSView(source = source, filters=[BooleanFilter(booleans)]) #Use the BooleanFilter instead of the IndexFilter p = figure(title = "Unemployment data", x_axis_label = 'Year', y_axis_label='Percentage')# x_range = (1980,2020) p.circle(x = 'Year', y = 'Annual', source = source, view = view1, color = 'red') p.line(x = 'Year', y = 'Annual', source = source, color = 'blue') show(p)

Reference: https://www.tutorialspoint.com/bokeh/bokeh_filtering_data.htm Note: The example in the above link uses a line plot; CDSView does not accept contigous plots for filtering

Finally, below is an example of how you can use Bokeh widgets in your chart. They key with these widgets is to create a callback method that will adjust the visualization along assigning the widget the callback function and value. To access the value from the callback method, use the cb_obj. method along with the target property in this case value. source.change.emit() will update the figure.

from bokeh.layouts import column from bokeh.models import ColumnDataSource, CustomJS, Slider from bokeh.plotting import Figure, output_file, show output_file("js_on_change.html") x = [x*0.005 for x in range(0, 200)] y = x source = ColumnDataSource(data=dict(x=x, y=y)) plot = Figure(plot_width=400, plot_height=400) plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6) callback = CustomJS(args=dict(source=source), code=""" var data = source.data; var f = cb_obj.value; console.log(cb_data); console.log(cb_obj); var x = data['x']; var y = data['y']; for (var i = 0; i < x.length; i++) { y[i] = Math.pow(x[i], f); } source.change.emit(); """) slider = Slider(start=0.1, end=4, value=1, step=.1, title="power") slider.js_on_change('value', callback) layout = column(plot,slider) show(layout)
MIME type unknown not supported