Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
| Download
Project: SD70
Views: 3192\documentclass{beamer}1\usepackage{graphicx}2\usepackage{url}34\mode<presentation> {5\usetheme{Madrid}6}78\usepackage{graphicx}9\usepackage{booktabs}1011\title[SageMathCloud]{Using RethinkDB in Production for SageMathCloud}12\author{William Stein}13\institute[SMC]14{15University of Washington \\16\medskip17SageMath, Inc. \\18\medskip19\url{https://cloud.sagemath.com/}20}21\date{\today}2223\begin{document}2425\begin{frame}26\titlepage27\end{frame}2829\begin{frame}30\frametitle{What is Sage?}3132\begin{block}{SageMath}33\begin{itemize}3435\item SageMath: big open source math software I started in 200436\end{itemize}37\end{block}3839\begin{block}{SageMathCloud (SMC)}40\begin{itemize}41\item {\bf Launched:} 201342\item {\bf Real-time editing like Google Docs:} Latex, IPython/Jupyter notebooks, Sage, Terminals, Teaching, etc.43\item {\bf Tech Stack:} RethinkDB, Linux, React.js, Node.js, SageMath/Python, CodeMirror, CoffeeScript44\item {\bf Users:} 4000+ daily active; nearly 1000 simultaneous45\item {\bf Production:} anger when it doesn't work -- ``my homework is gone!"46\item {\bf Open source:} 100\% open source, GPL 3, etc.47\end{itemize}48\end{block}4950\end{frame}5152\begin{frame}53\frametitle{Hi From Sage Days 70}54\includegraphics[keepaspectratio=true,width=.9\paperwidth]{sagedays70}55\end{frame}56575859\begin{frame}60\frametitle{RethinkDB and SMC}6162{\bf Switched from Cassandra} to RethinkDB this summer.6364\begin{block}{SMC Uses RethinkDB Heavily...}65\begin{itemize}66\item {\bf Setup:}67\begin{itemize}68\item 6 Google compute engine nodes (quad-core n1-standard-4)69\item About 23 tables storing about 5 million documents70\item Replication factor 3, sharding of 371\item Storage in persistent (network-mounted) SSD72\end{itemize}73\item 5K-10K simultaneous changefeeds.74\end{itemize}75\end{block}7677\begin{block}{Operations}78\begin{itemize}79\item {\bf Backups:} periodic dump of most tables to json on a compressed filesystem, snapshot via bup (=git+more), rsync to google cloud storage and encrypted off-site USB drives.80\end{itemize}81\end{block}8283RethinkDB team {\bf amazing} at addressing all issues I encountered.8485\end{frame}868788\begin{frame}89\frametitle{SMC Demo}9091\begin{block}{Show how SMC uses RethinkDB}92\begin{enumerate}93\item Change name and see change in another browser.94\item Show changing project title and that appearing in another browser.95\item Draw a 3d plot in a sage worksheet96\item Open a Jupyter notebook -- demo sync and history97\item No REST/API calls; instead, set entries in a table, back-end sees it, makes table change, all parts of all front-ends simultaneously see that (do a demo of project restart).98\end{enumerate}99\end{block}100101\end{frame}102103\begin{frame}104\frametitle{SMC Demo: Change username}105\includegraphics[keepaspectratio=true,width=.9\paperwidth]{smc-name}106\end{frame}107108\begin{frame}109\frametitle{SMC Demo: Change Project Title}110\includegraphics[keepaspectratio=true,width=.9\paperwidth]{smc-title}111\end{frame}112113\begin{frame}114\frametitle{SMC Demo: 3D Plot}115\includegraphics[keepaspectratio=true,width=.9\paperwidth]{smc-plot}116\end{frame}117118\begin{frame}119\frametitle{SMC Demo: Jupyter Notebook}120\includegraphics[keepaspectratio=true,width=.9\paperwidth]{smc-jupyter}121\end{frame}122123124\begin{frame}125\frametitle{How SageMathCloud uses Changefeeds}126127\begin{block}{Motivation}128\begin{itemize}129\item Make front-end development easier130\item Simplify code connecting the front-end to back-end (one declaration instead of messages flying all over)131\end{itemize}132\end{block}133134\begin{block}{Inspiration}135\begin{itemize}136\item Facebook's GraphQL -- but simpler137\end{itemize}138\end{block}139140\begin{block}{Goal}141\begin{itemize}142\item Have declarative client-side queries and database schema143\item Instant notifications about changes.144\end{itemize}145\end{block}146\end{frame}147148\begin{frame}[fragile]149\frametitle{GraphQL-like API on RethinkDB}150\vfill151\begin{center}152\Large153Building a GraphQL-like API on RethinkDB and Node.js154\end{center}155\vfill156157\begin{block}{(do not look at this)}158\tiny159Browser (or iOS/Android at some point) client query:160\begin{itemize}161\item JSON object that describes what result should look like;162null's get filled in.163\verb|{table:{foo:bar, stuff:null}}| gets one record in table where \verb|foo="bar"| and \verb|{table:[{foo:bar, stuff:null}]}| gets them all.164\item If \verb|changes=true|, then any time RethinkDB table changes, client gets updates, and anytime client makes changes, they get pushed to back-end to RethinkDB.165\item Tables can be "virtual", and not correspond to actual RethinkDB tables. e.g., different permissions, or involving multiple tables (so joins, technically; they also have a killfeed).166\end{itemize}167168\begin{itemize}169\item Show {\tt schema.coffee}.170171\item Text editing: describe algorithm based on the above, which isn't deployed yet.172\end{itemize}173\end{block}174\end{frame}175176%\begin{frame}177%\frametitle{Running RethinkDB in production}178%\begin{block}{Setup}179%\begin{itemize}180%\item Until mid-Oct we used 6 dual-core n1-highcpu-2, but had to switch to 6 quad-core n1-standard-4.181%182%\item We use 6 GCE nodes, with about 20+ tables, about 5 Million documents, replication factor 3, sharding of 3, persistent (network-mounted) SSD.183%\end{itemize}184%\end{block}185%\end{frame}186%187%\begin{frame}188%\frametitle{Running RethinkDB in production /2}189%\begin{block}{Experiences}190%\begin{itemize}191%\item Often have around 5000 changefeeds.192%193%\item Had some trouble with automatic failover (I test in production to be sure it is working!).194%195%\item The RethinkDB team was \textbf{amazing} in fixing absolutely all bugs I found.196%197%\item Backups by dumping most tables frequently and using bup198%(=git+more, \url{bup.github.io}) to backup,199%then rsync to cloud storage and offsite encrypted USB drive.200%201%\end{itemize}202%\end{block}203%\end{frame}204205\begin{frame}206\frametitle{Instrumentation data in production}207\begin{block}{Example: Server Overload}208About 3 week of data for November 2015 across 6 nodes.209At one point (with 6 n1-highcpu-2’s), we hit a threshold (with around 850 simultaneous users) and the backend collapsed.210\end{block}211\begin{block}{Solution}212A new node had to be added (Tue 27th).213\end{block}214\end{frame}215216\begin{frame}217\frametitle{Memory usage across database nodes}218\includegraphics[keepaspectratio=true,width=.9\paperwidth]{rethinkdb-memory.png}219\end{frame}220221\begin{frame}222\frametitle{CPU Load (1 min) across database nodes}223\includegraphics[keepaspectratio=true,width=.9\paperwidth]{rethinkdb-load-1min.png}224\end{frame}225226\begin{frame}227\frametitle{TCP connections across database nodes}228\includegraphics[keepaspectratio=true,width=.9\paperwidth]{rethinkdb-tcp-connections.png}229\end{frame}230231\begin{frame}232\frametitle{Thanks!}233\begin{block}{Sign up today!}234\medskip235{\LARGE \url{https://cloud.sagemath.com/}}\\236\medskip237\end{block}238\end{frame}239240\end{document}241242