CoCalc -- paper.tex

GitHub Repository: emscripten-core/emscripten
Path: blob/main/docs/paper.tex
⁴¹²⁸ views
1
%\documentclass[11pt]{proc}
2
\documentclass[preprint,10pt]{sigplanconf}
3

4
\usepackage{amsmath}
5
\usepackage{url}
6
\usepackage{graphicx}
7

8
\begin{document}
9

10
\title{Emscripten: An LLVM-to-JavaScript Compiler}
11
\conferenceinfo{Splash '11}{??-2011, Portland.} 
12
\copyrightyear{2011} 
13
\copyrightdata{[to be supplied]} 
14
\titlebanner{}        % These are ignored unless
15
\preprintfooter{}   % 'preprint' option specified.
16
\authorinfo{Alon Zakai}
17
           {Mozilla}
18
           {azakai@mozilla.com}
19

20
\maketitle
21

22
%\title{Emscripten: An LLVM-to-JavaScript Compiler}
23
%\subtitle{}
24
%\authorinfo{Alon Zakai}
25
%           {Mozilla}
26
%           {[email protected]}
27
%\author{Alon Zakai \\ Mozilla \\ \url{[email protected]}}
28
%\maketitle
29

30
\begin{abstract}
31
We present Emscripten, a compiler from LLVM (Low Level Virtual Machine) assembly to JavaScript. This
32
opens up two avenues for running code written
33
in languages other than JavaScript on the web: (1) Compile code directly into LLVM assembly, and
34
then compile that into JavaScript using Emscripten, or (2) Compile
35
a language's entire runtime into LLVM and then JavaScript, as in the previous
36
approach, and then use the compiled runtime to run code written in that language. For example, the
37
former approach can work for C and C++, while the latter can work for Python; all three
38
examples open up new opportunities for running code on the web.
39

40
Emscripten itself is written in JavaScript and is available under the MIT
41
license (a permissive open source license), at \url{http://www.emscripten.org}.
42
As a compiler from LLVM to JavaScript, the challenges in designing
43
Emscripten are somewhat the reverse of the norm -- one must go from a low-level
44
assembly into a high-level language, and recreate parts of the original
45
high-level structure of the code that were lost in the compilation to
46
low-level LLVM. We detail the methods used in
47
Emscripten to deal with those challenges, and in particular present and prove
48
the validity of Emscripten's Relooper
49
algorithm, which recreates high-level loop structures from low-level
50
branching data.
51
\end{abstract}
52

53
%\category{CR-number}{subcategory}{third-level}
54

55
%\terms
56
%term1, term2
57

58
%\keywords
59
%keyword1, keyword2
60

61
\bigskip
62

63
%\copyright 2011 Alon Zakai. License: Creative Commons Attribution-ShareAlike (CC BY-SA), \url{http://creativecommons.org/licenses/by-sa/3.0/}
64

65
\section{Introduction}
66

67
Since the mid 1990's, JavaScript~\cite{js} has been present in most web browsers (sometimes
68
with minor variations and under slightly different names, e.g., JScript in Internet
69
Explorer), and today it is
70
well-supported on essentially all web browsers, from desktop browsers like
71
Internet Explorer, Firefox, Chrome and Safari, to mobile browsers on smartphones
72
and tablets. Together with HTML and CSS, JavaScript forms the standards-based
73
foundation of the web.
74

75
Running other programming languages on the web has been suggested many times,
76
and browser plugins have allowed doing so, e.g., via the Java
77
and Flash plugins. However, plugins must be manually installed and do not integrate in
78
a perfect way with the outside HTML. Perhaps more problematic is that they cannot run
79
at all on some platforms, for example, Java and Flash cannot run on iOS devices such as the iPhone
80
and iPad. For those reasons, JavaScript remains
81
the primary programming language of the web.
82

83
There are, however, reasonable motivations for running code from
84
other programming languages on the web, for example, if one has a large
85
amount of existing code already written in another language, or if one
86
simply has a strong preference for another language and perhaps is
87
more productive in it. As a consequence, there has been work on tools to compile languages
88
\textbf{into} JavaScript. Since JavaScript is present in essentially all web
89
browsers, by compiling one's language of choice into JavaScript, one
90
can still generate content that will run practically everywhere.
91

92
Examples of the approach of compiling into JavaScript include
93
the Google Web Toolkit~\cite{gwt}, which compiles Java into JavaScript;
94
Pyjamas\footnote{\url{http://pyjs.org/}}, which compiles Python into JavaScript;
95
SCM2JS \cite{hop}, which compiles Scheme to JavaScript,
96
Links \cite{links}, which compiles an ML-like language into JavaScript;
97
and AFAX \cite{afax}, which compiles F\# to JavaScript;
98
see also \cite{ashkenas} for additional examples.
99
While useful, such tools usually only allow a subset of the original language to
100
be compiled. For example, multithreaded code (with shared memory) is
101
not possible on the web, so compiling code of that sort is
102
not directly possible. There are also often limitations of the conversion
103
process, for example, Pyjamas compiles Python to JavaScript in a nearly
104
1-to-1 manner, and as a consequence the underlying semantics are those of JavaScript,
105
not Python, so for example division of integers can yield unexpected results
106
(it should yield an integer in Python 2.x,
107
but in JavaScript and in Pyjamas a floating-point number can be generated).
108

109
In this paper we present another project along those lines: \textbf{Emscripten},
110
which compiles LLVM (Low Level Virtual Machine\footnote{\url{http://llvm.org/}}) assembly into JavaScript.
111
LLVM is a compiler project primarily focused on C, C++ and
112
Objective-C. It compiles those languages through a \emph{frontend} (the
113
main ones of which are Clang and LLVM-GCC) into the
114
LLVM intermediary representation (which can be machine-readable
115
bitcode, or human-readable assembly), and then passes it
116
through a \emph{backend} which generates actual machine code for a particular
117
architecture. Emscripten plays the role of a backend which targets JavaScript.
118

119
By using Emscripten, potentially many languages can be
120
run on the web, using one of the following methods:
121
\begin{itemize}
122
\item Compile \textbf{code} in a language recognized by one of the existing LLVM frontends
123
      into LLVM, and then compile that
124
      into JavaScript using Emscripten. Frontends for various languages
125
      exist, including many of the most popular programming languages such as C and
126
      C++, and also various new and emerging languages (e.g., Rust\footnote{\url{https://github.com/graydon/rust/}}).
127
\item Compile the \textbf{runtime} used to parse and execute code in
128
      a particular language into LLVM, then compile that into JavaScript using
129
      Emscripten. It is then possible to run code in that runtime on the web.
130
      This is a useful approach if
131
      a language's runtime is written in a language for which an LLVM
132
      frontend exists, but the language itself has no such frontend. For
133
      example, there is currently no frontend for Python, however
134
      it is possible to compile CPython -- the standard implementation of
135
      Python, written in C -- into JavaScript, and run Python code on that
136
      (see Section~\ref{sec:examples}).
137
\end{itemize}
138

139
From a technical standpoint, one challenge in designing and implementing
140
Emscripten is that it compiles a low-level language -- LLVM assembly -- into
141
a high-level one -- JavaScript. This is somewhat the reverse of the usual
142
situation one is in when building a compiler, and leads to some unique
143
difficulties. For example, to get good performance in JavaScript one must
144
use natural JavaScript code flow structures, like loops and ifs, but
145
those structures do not exist in LLVM assembly (instead, what is present
146
there is a `soup of code fragments': blocks of code with branching information
147
but no high-level structure).
148
Emscripten must therefore reconstruct a high-level
149
representation from the low-level data it receives.
150

151
In theory that issue could have been avoided by compiling a higher-level
152
language into JavaScript. For example, if compiling Java into JavaScript
153
(as the Google Web Toolkit does), then one can benefit from the fact
154
that Java's loops, ifs and so forth generally have a very direct parallel
155
in JavaScript. But of course the downside in that approach is it yields a
156
compiler only for Java. In Section~\ref{sec:relooper}
157
we present the `Relooper' algorithm, which generates high-level loop structures from the low-level
158
branching data present in LLVM assembly. It is similar to loop recovery algorithms used in decompilation
159
(see, for example, \cite{Cifuentes98assemblyto}, \cite{pro97}).
160
The main difference between the Relooper and standard loop recovery algorithms
161
is that the Relooper generates loops in a different language than that which was compiled originally, whereas
162
decompilers generally assume they are returning to the original language. The Relooper's
163
goal is not to accurately recreate the original source code, but rather to generate
164
native JavaScript control flow structures, which can then be implemented
165
efficiently in modern JavaScript engines.
166

167
Another challenge in Emscripten is to maintain accuracy (that is, to
168
keep the results of the compiled code the same as the original)
169
while not sacrificing performance. LLVM assembly
170
is an abstraction of how modern CPUs are programmed for, and its basic
171
operations are not all directly possible in JavaScript. For example, if in
172
LLVM we are to add two unsigned 8-bit numbers $x$ and $y$, with overflowing (e.g., 255
173
plus 1 should give 0), then there is no single operation in JavaScript which
174
can do this -- we cannot just write $x+y$, as that would use the normal JavaScript
175
semantics. It is possible to emulate a CPU in JavaScript, however doing so
176
is very slow. Emscripten's approach is to allow such emulation, but to try to
177
use it as little as possible, and to provide tools that help one find out
178
which parts of the compiled code actually need such full emulation.
179

180
We conclude this introduction with a list of this paper's main contributions:
181
\begin{itemize}
182
\item We describe Emscripten itself, during
183
      which we detail its approach in compiling LLVM into JavaScript.
184
\item We give details of Emscripten's Relooper algorithm, mentioned earlier, which generates
185
      high-level loop structures from low-level branching data, and prove
186
      its validity.
187
\end{itemize}
188
In addition, the following are the main contributions of Emscripten
189
itself, that to our knowledge were not previously possible:
190
\begin{itemize}
191
\item It allows compiling a very large subset of C and C++ code into
192
      JavaScript, which can then be run on the web.
193
\item By compiling their runtimes, it allows running languages such as Python
194
      on the web (with their normal semantics).
195
\end{itemize}
196

197
The remainder of this paper is structured as follows. In Section~\ref{sec:compapp} we
198
describe the approach Emscripten takes to compiling LLVM assembly into JavaScript,
199
and show some benchmark data.
200
In Section~\ref{sec:emarch} we describe Emscripten's internal design and in
201
particular elaborate on the Relooper algorithm.
202
In Section~\ref{sec:examples} we give several example uses of
203
Emscripten. In Section~\ref{sec:summary} we summarize and give directions for future
204
work.
205

206
\section{Compilation Approach}
207
\label{sec:compapp}
208

209
Let us begin by considering what the challenge is, when we want to compile LLVM assembly
210
into JavaScript. Assume we are given the
211
following simple example of a C program:
212
\begin{verbatim}
213
  #include <stdio.h>
214
  int main()
215
  {
216
    int sum = 0;
217
    for (int i = 1; i <= 100; i++)
218
      sum += i;
219
    printf("1+...+100=%d\n", sum);
220
    return 0;
221
  }
222
\end{verbatim}
223
This program calculates the sum of the integers from 1 to 100. When
224
compiled by Clang, the generated LLVM
225
assembly code includes the following:
226
\label{code:examplellvm}
227
\begin{verbatim}
228
@.str = private constant [14 x i8]
229
        c"1+...+100=%d\0A\00"
230

231
define i32 @main() {
232
  %1 = alloca i32, align 4
233
  %sum = alloca i32, align 4
234
  %i = alloca i32, align 4
235
  store i32 0, i32* %1
236
  store i32 0, i32* %sum, align 4
237
  store i32 1, i32* %i, align 4
238
  br label %2
239

240
; <label>:2
241
  %3 = load i32* %i, align 4
242
  %4 = icmp sle i32 %3, 100
243
  br i1 %4, label %5, label %12
244

245
; <label>:5
246
  %6 = load i32* %i, align 4
247
  %7 = load i32* %sum, align 4
248
  %8 = add nsw i32 %7, %6
249
  store i32 %8, i32* %sum, align 4
250
  br label %9
251

252
; <label>:9
253
  %10 = load i32* %i, align 4
254
  %11 = add nsw i32 %10, 1
255
  store i32 %11, i32* %i, align 4
256
  br label %2
257

258
; <label>:12
259
  %13 = load i32* %sum, align 4
260
  %14 = call i32 (i8*, ...)*
261
        @printf(i8* getelementptr inbounds
262
          ([14 x i8]* @.str, i32 0, i32 0),
263
          i32 %13)
264
  ret i32 0
265
}
266
\end{verbatim}
267
At first glance, this may look more difficult to translate into
268
JavaScript than the original C++. However, compiling C++ in
269
general would require writing code to handle preprocessing,
270
classes, templates, and all the idiosyncrasies and complexities
271
of C++. LLVM assembly, while more verbose in this example, is
272
lower-level and simpler to work on. Compiling it also has the benefit we
273
mentioned earlier, which
274
is one of the main goals of Emscripten, that it allows many languages can
275
be compiled into LLVM and not just C++.
276

277
A detailed overview of LLVM assembly is beyond our scope here (see \url{http://llvm.org/docs/LangRef.html}). Briefly,
278
though, the example assembly above can be seen to define a
279
function main(), then allocate some values on the stack (alloca),
280
then load and store various values (load and store). We do not have
281
the high-level code structure as we had in C++ (with a loop), instead
282
we have labeled code fragments, called LLVM basic blocks, and code flow moves
283
from one to another by branch (br) instructions. (Label 2 is the
284
condition check in the loop; label 5 is the body, label 9 is the
285
increment, and label 12 is the final part of the function, outside
286
of the loop).
287
Conditional branches
288
can depend on calculations, for example the results of comparing
289
two values (icmp). Other numerical operations include addition (add).
290
Finally, printf is called (call). The challenge, then, is to convert
291
this and things like it into JavaScript.
292

293
In general, Emscripten's main approach is to translate each line of LLVM
294
assembly into JavaScript, 1 to 1, into `normal' JavaScript
295
as much as possible. So, for example, an \emph{add} operation becomes
296
a normal JavaScript addition, a function call becomes a JavaScript
297
function call, etc. This 1 to 1 translation generates JavaScript
298
that resembles the original assembly code, for example, the LLVM assembly code shown
299
before for main() would be compiled into the following:
300
\label{code:example}
301
\begin{verbatim}
302
function _main() {
303
  var __stackBase__  = STACKTOP;
304
  STACKTOP += 12;
305
  var __label__ = -1;
306
  while(1) switch(__label__) {
307
    case -1:
308
      var $1 = __stackBase__;
309
      var $sum = __stackBase__+4;
310
      var $i = __stackBase__+8;
311
      HEAP[$1] = 0;
312
      HEAP[$sum] = 0;
313
      HEAP[$i] = 0;
314
      __label__ = 0; break;
315
    case 0:
316
      var $3 = HEAP[$i];
317
      var $4 = $3 <= 100;
318
      if ($4) { __label__ = 1; break; }
319
      else    { __label__ = 2; break; }
320
    case 1:
321
      var $6 = HEAP[$i];
322
      var $7 = HEAP[$sum];
323
      var $8 = $7 + $6;
324
      HEAP[$sum] = $8;
325
      __label__ = 3; break;
326
    case 3:
327
      var $10 = HEAP[$i];
328
      var $11 = $10 + 1;
329
      HEAP[$i] = $11;
330
      __label__ = 0; break;
331
    case 2:
332
      var $13 = HEAP[$sum];
333
      var $14 = _printf(__str, $13);
334
      STACKTOP = __stackBase__;
335
      return 0;
336
  }
337
}
338
\end{verbatim}
339
Some things
340
to take notice of:
341
\begin{itemize}
342
\item A switch-in-a-loop construction is used in order to let the flow
343
      of execution move between basic blocks of code in an arbitrary manner: We set
344
      \emph{\_\_label\_\_} to the (numerical representation of the) label of
345
      the basic block we want to reach, and do a break, which leads to the proper
346
      basic block being reached. Inside each basic block, every line of code corresponds to a line of
347
      LLVM assembly, generally in a very straightforward manner. 
348
\item Memory is implemented by \emph{HEAP}, a JavaScript array. Reading from
349
      memory is a read from that array, and writing to memory is a write.
350
      \emph{STACKTOP} is the current position of the stack. (Note that we
351
      allocate 4 memory locations for 32-bit integers on the stack, but only 
352
      write to 1 of them. See Section~\ref{sec:lsc} for why.)
353
\item LLVM assembly functions become JavaScript functions, and function calls
354
      are normal JavaScript function calls. In general, we attempt to generate
355
      as `normal' JavaScript as possible.
356
\item We implemented the LLVM \emph{add} operation using simple addition in JavaScript.
357
      As mentioned earlier, the semantics of that code are not entirely identical to
358
      those of the original LLVM assembly code (in this case, overflows will have very
359
      different effects). We will explain Emscripten's approach to that problem in
360
      Section~\ref{sssec:realworldcode}.
361
\end{itemize}
362

363
\subsection{Performance}
364
\label{sec:perf}
365

366
In this section we will deal with several topics regarding
367
Emscripten's approach to generating high-performance JavaScript code.
368

369
\subsubsection{Load-Store Consistency (LSC)}
370
\label{sec:lsc}
371

372
We saw before that Emscripten's memory usage allocates the usual number
373
of bytes on the stack for variables (4 bytes for a 32-bit integer, etc.).
374
However, we only wrote values into the first location, which appeared odd.
375
We will now see the reason for that.
376

377
To get there, we must first step back, and note that
378
Emscripten does not aim to achieve perfect compatibility with all possible
379
LLVM assembly (and correspondingly, with all possible C or C++ code, etc.);
380
instead, Emscripten targets a large subset of LLVM assembly code, which is portable
381
and does not make crucial assumptions about the underlying CPU architecture
382
on which the code is meant to run. That subset is meant to encompass the
383
vast majority of real-world code that would be compiled into LLVM,
384
while also being compilable into very
385
performant JavaScript.
386

387
More specifically, Emscripten assumes that the LLVM assembly code it is
388
compiling has \textbf{Load-Store Consistency} (LSC), which is the requirement that
389
after a value with a specific type is written to a memory location, loads from
390
that memory location will be of the same type (until a value with a different
391
type is written there).
392
Normal C and C++
393
code generally does so: If $x$ is a variable containing a 32-bit floating
394
point number, then both loads and stores of $x$ will be of 32-bit floating
395
point values, and not 16-bit unsigned integers or anything else.
396

397
To see why this is important for performance, consider the following
398
C code fragment, which does \emph{not} have LSC:
399
\begin{verbatim}
400
  int x = 12345;
401
  printf("first byte: %d\n", *((char*)&x));
402
\end{verbatim}
403
Assuming an architecture with more than 8 bits, this code will read
404
the first byte of \emph{x}. (This might, for example, be used to detect the
405
endianness of the CPU.) To compile this into JavaScript in
406
a way that will run properly, we must do more than a single operation
407
for either the read or the write, for example we could do this:
408
\begin{verbatim}
409
  var x_value = 12345;
410
  var x_addr = stackAlloc(4);
411
  HEAP[x_addr]   = (x_value >> 0) & 255;
412
  HEAP[x_addr+1] = (x_value >> 8) & 255;
413
  HEAP[x_addr+2] = (x_value >> 16) & 255;
414
  HEAP[x_addr+3] = (x_value >> 24) & 255;
415
  [...]
416
  printf("first byte: %d\n", HEAP[x_addr]);
417
\end{verbatim}
418
Here we allocate space for the value of \emph{x} on the stack, and
419
store that address in \emph{x\_addr}. The stack itself is part of
420
the `memory space', which is the array \emph{HEAP}. In order for
421
the read on the final line to give the proper value, we must go to
422
the effort of doing 4 store operations, each of the value of a
423
particular byte. In other words, \emph{HEAP} is an array of bytes,
424
and for each store into memory, we must deconstruct the value into
425
bytes.\footnote{Note that we can use JavaScript typed arrays with a shared memory
426
buffer, which would work as expected, assuming (1) we are running
427
in a JavaScript engine which supports typed arrays, and (2) we
428
are running on a CPU with the same architecture as we expect. This
429
is therefore dangerous as the generated code may run differently on
430
different JavaScript engines and different CPUs.
431
Emscripten currently has optional experimental support for typed arrays.}
432

433
Alternatively, we can store the value in a single operation, and
434
deconstruct into bytes as we load. This will be faster in some
435
cases and slower in others, but is still more overhead
436
than we would like, generally speaking -- for if the code \textbf{does} have
437
LSC, then we can translate that code fragment into
438
the far more optimal
439
\begin{verbatim}
440
  var x_value = 12345;
441
  var x_addr = stackAlloc(4);
442
  HEAP[x_addr] = x_value;
443
  [...]
444
  printf("first byte: %d\n", HEAP[x_addr]);
445
\end{verbatim}
446
(Note that even this can be optimized even more -- we can store
447
\emph{x} in a normal JavaScript variable. We will discuss such
448
optimizations in Section \ref{sec:codeopt}; for now we are
449
just clarifying why it is useful to assume we are compiling code
450
that has LSC.)
451

452
In practice the vast majority of C and C++ code does have LSC. Exceptions
453
do exist, however, for example:
454
\begin{itemize}
455
\item Code that detects CPU features like endianness, the behavior of floats, etc. In general such code can be disabled
456
      before running it through Emscripten, as it is not actually needed.
457
\item \emph{memset} and related functions typically work on values of one kind,
458
      regardless of the underlying values. For example, memset may write 64-bit
459
      values on a 64-bit CPU since that is usually faster than writing individual
460
      bytes. This tends to
461
      not be a problem, as with \emph{memset} the most common case is setting to
462
      0, and with \emph{memcpy}, the values end up copied properly anyhow (with
463
      a proper implementation of \emph{memcpy} in Emscripten's generated code).
464
\item Even LSC-obeying C or C++ code may turn into LLVM assembly that does not,
465
      after being optimized. For example, when storing two 32-bit integers constants into
466
      adjoining locations in a structure, the optimizer may generate a single
467
      64-bit store of an appropriate constant. In other words, optimization can
468
      generate nonportable code, which runs faster on the current CPU, but
469
      nowhere else. Emscripten currently assumes that optimizations of this form
470
      are not being used.
471
\end{itemize}
472
In practice it may be hard to know if code has LSC or not, and requiring
473
a time-consuming code audit is obviously impractical. Emscripten therefore has
474
a compilation option, SAFE\_HEAP, which generates code that checks that LSC holds, and warns if it
475
doesn't. It also warns about other memory-related issues like
476
reading from memory before a value was written (somewhat similarly to tools
477
like Valgrind\footnote{\url{http://valgrind.org/}}). When such problems are detected, possible solutions are to ignore the issue (if it has no actual
478
consequences), or alter the source code.
479

480
Note that it is somewhat wasteful to allocate 4 memory locations for
481
a 32-bit integer, and use only one of them. It is possible to change
482
that behavior with the QUANTUM\_SIZE parameter to Emscripten, however,
483
the difficulty is that LLVM assembly has hardcoded values that depend on
484
the usual memory sizes being used. We are looking into modifications
485
to LLVM itself to remedy that.
486

487
\subsubsection{Emulating Code Semantics}
488
\label{sssec:realworldcode}
489

490
As mentioned in the introduction, the semantics of LLVM assembly and JavaScript are not identical: The former
491
is very close to that of a modern CPU, while the latter is a high-level
492
dynamic language. Both are of course Turing-complete, so it is possible to
493
precisely emulate each in the other, but doing so with good performance is
494
more challenging. For example, if we want to convert
495
\begin{verbatim}
496
  add i8 %1, %2
497
\end{verbatim}
498
(add two 8-bit integers) to JavaScript, then to be completely accurate we must emulate the
499
exact same behavior, in particular, we must handle overflows properly, which would not be the case if we just implement
500
this as $\%1 + \%2$ in JavaScript. For example, with inputs of $255$ and $1$, the
501
correct output is 0, but simple addition in JavaScript will give us 256. We
502
can of course emulate the proper behavior by adding additional code.
503
This however significantly degrades performance,
504
because modern JavaScript engines can often translate something like $z = x + y$ into
505
native code containing a single instruction (or very close to that), but if instead we had
506
something like $z = (x + y)\&255$ (in order to correct overflows), the JavaScript engine
507
would need to generate additional code to perform the AND operation.\footnote{
508
In theory, the JavaScript engine could determine that we are implicitly working
509
on 8-bit values here, and generate machine code that no longer needs the AND operation.
510
However, most or all modern JavaScript engines have just two internal numeric types, doubles and
511
32-bit integers. This is so because they are tuned for `normal' JavaScript code
512
on the web, which in most cases is served well by just those two types.
513

514
In addition, even if JavaScript engines did analyze code containing $\&255$, etc.,
515
in order to deduce that a variable can be implemented
516
as an 8-bit integer, there is a cost to including all the necessary $\&255$ text
517
in the script, because code size is a significant factor on the web. Adding even
518
a few characters for every single mathematic operation, in a large JavaScript file,
519
could add up to a significant increase in download size.}
520

521
Emscripten's approach to this problem is to allow the generation of both accurate code,
522
that is identical in behavior to LLVM assembly, and inaccurate code which is
523
faster. In practice, most addition operations in LLVM do not overflow,
524
and can simply be translated into $\%1 + \%2$. Emscripten
525
provides tools that make it straightforward to find which code does require
526
the slower, more accurate code, and to generate that code in those locations, as follows:
527
\begin{itemize}
528
\item Compile the code using Emscripten with special options that generate runtime checking.
529
      CHECK\_OVERFLOWS adds runtime checks for integer overflows, CHECK\_SIGNS
530
      checks for signing issues (the behavior of signed and unsigned integers can
531
      be different, and JavaScript does not natively support that difference), and
532
      CHECK\_ROUNDINGS checks for rounding issues (in C and C++, the convention is
533
      to round towards 0, while in JavaScript there is no simple operation that does
534
      the same).
535
\item Run the compiled code on a representative sample of inputs, and notice which
536
      lines are warned about by the runtime checks.
537
\item Recompile the code, telling Emscripten to add corrections (using CORRECT\_SIGNS, CORRECT\_OVERFLOWS
538
      or CORRECT\_ROUNDINGS) only on the specific lines that actually need it.
539
\end{itemize}
540

541
This method is not guaranteed to work, as if we do not run on a truly representative
542
sample of possible inputs, we may not compile with all necessary corrections. It is
543
of course possible to compile with all corrections applied to all the code, to make
544
sure things will work properly (this is the default compilation setting), however, in
545
practice the procedure above works quite well, and results in code is significantly faster.
546

547
\subsubsection{Emscripten Code Optimizations}
548
\label{sec:codeopt}
549

550
When comparing the example program from page~\pageref{code:example},
551
the generated code was fairly complicated
552
and cumbersome, and unsurprisingly it performs quite poorly. There
553
are two main reasons for that: First, that the code is simply
554
unoptimized -- there are many variables declared when fewer could
555
suffice, for example, and second, that the code does not use `normal'
556
JavaScript, which JavaScript engines are optimized for -- it
557
stores all variables in an array (not normal JavaScript variables),
558
and it controls the flow of execution using a switch-in-a-loop, not
559
normal JavaScript loops and ifs.
560

561
Emscripten's approach to generating fast-performing code is as
562
follows. Emscripten doesn't do any
563
optimizations that can be done by other tools:
564
LLVM can be used to perform optimizations before Emscripten, and
565
the Closure Compiler\footnote{\url{http://code.google.com/closure/compiler/}}
566
can perform optimizations on the generated JavaScript afterwards. Those
567
tools will perform standard useful optimizations like removing unneeded variables, dead code elimination,
568
function inlining, etc.
569
That leaves two major optimizations that are left for Emscripten
570
to perform:
571
\begin{itemize}
572
\item \textbf{Variable nativization}: Convert variables
573
      that are on the stack -- which is implemented using addresses in the \emph{HEAP} array
574
      as mentioned earlier -- into native JavaScript variables (that is to say, \emph{var x;} and so forth). In general,
575
      a variable will be nativized unless it is used
576
      outside that function, e.g., if its address is taken and stored somewhere
577
      or passed to another function. When optimizing, Emscripten tries to nativize
578
      as many variables as possible.
579
\item \textbf{Relooping}: Recreate high-level loop and if structures
580
      from the low-level code block data that appears in LLVM assembly.
581
      We describe Emscripten's Relooper algorithm in Section~\ref{sec:relooper}.
582
\end{itemize}
583

584
When run with Emscripten's optimizations, the code on page \pageref{code:example} looks
585
like this:
586
\begin{verbatim}
587
function _main() {
588
  var __label__;
589
  var $1;
590
  var $sum;
591
  var $i;
592
  $1 = 0;
593
  $sum = 0;
594
  $i = 0;
595
  $2$2: while(1) {
596
    var $3 = $i;
597
    var $4 = $3 <= 100;
598
    if (!($4)) { __label__ = 2; break $2$2; }
599
    var $6 = $i;
600
    var $7 = $sum;
601
    var $8 = $7 + $6;
602
    $sum = $8;
603
    var $10 = $i;
604
    var $11 = $10 + 1;
605
    $i = $11;
606
    __label__ = 0; continue $2$2;
607
  }
608
  var $13 = $sum;
609
  var $14 = _printf(__str, $13);
610
  return 0;
611
}
612
\end{verbatim}
613
If in addition the Closure Compiler is run on that output, we get
614
\begin{verbatim}
615
function K() {
616
  var a, b;
617
  b = a = 0;
618
  a:for(;;) {
619
    if(!(b <= 100)) {
620
      break a
621
    }
622
    a += b;
623
    b += 1;
624
  }
625
  _printf(J, a);
626
  return 0;
627
}
628
\end{verbatim}
629
which is fairly close to the original C++ (the differences, of
630
having the loop's condition inside the loop instead of inside
631
the for() expression at the top of the original loop, are not important to performance). Thus, it is possible
632
to recreate the original high-level structure of the code that
633
was compiled into LLVM assembly.
634

635
\subsection{Benchmarks}
636
\label{sec:benchmarks}
637

638
We will now take a look at some performance benchmarks:
639

640
\bigskip
641

642
\begin{tabular}{ l | c | c | c || c }
643
  \hline
644
  \textbf{benchmark} & \textbf{SM}    & \textbf{V8}    & \textbf{gcc} & \textbf{ratio} \\
645
  \hline
646
  fannkuch (10)      & 1.158          & \textbf{0.931} & 0.231       &  4.04 \\
647
  fasta (2100000)    & \textbf{1.115} & 1.128          & 0.452       &  2.47 \\
648
  primes             & \textbf{1.443} & 3.194          & 0.438       &  3.29 \\
649
  raytrace (7,256)   & \textbf{1.930} & 2.944          & 0.228       &  8.46 \\
650
  dlmalloc (400,400) & 5.050          & \textbf{1.880} & 0.315       &  5.97 \\
651
  \hline
652
\end{tabular}
653

654
\bigskip
655

656
The first column is the name of the benchmark, and in parentheses any
657
parameters used in running it. The source code to all the benchmarks
658
can be found at \url{https://github.com/kripken/emscripten/tree/main/tests}
659
(each in a separate file with its name, except for `primes', which is
660
embedded inside runner.py in the function test\_primes). A brief summary of
661
the benchmarks is as follows:
662
\begin{itemize}
663
\item \textbf{fannkuch} and \textbf{fasta} are commonly-known benchmarks, appearing for example
664
      on the Computer Language Benchmarks Game\footnote{\url{http://shootout.alioth.debian.org/}}.
665
      They use a mix of mathematic operations (integer in the former, floating-point in the latter) and memory access.
666
\item \textbf{primes} is the simplest benchmark in terms of code. It is basically just a tiny loop that calculates prime numbers.
667
\item \textbf{raytrace} is real-world code, from the sphereflake raytracer\footnote{\url{http://ompf.org/ray/sphereflake/}}. This benchmark has a combination of memory access and floating-point math.
668
\item \textbf{dlmalloc} (Doug Lea's malloc\footnote{\url{http://en.wikipedia.org/wiki/Malloc#dlmalloc_and_its_derivatives}}) is a well-known real-world implementation of malloc and free. This benchmark does a large amount of calls to malloc and free in an intermixed way, which tests memory access and integer calculations. 
669
\end{itemize}
670

671
Returning to the table of results, the second
672
column is the elapsed time (in seconds) when running the compiled code (generated using all Emscripten and LLVM
673
optimizations as well as the Closure Compiler) in the SpiderMonkey JavaScript
674
engine (specifically the JaegerMonkey branch, checked out June 15th, 2011).
675
The third column is the elapsed time when running the same JavaScript code in the V8 JavaScript engine
676
(checked out Jun 15th, 2011). In both the second and third column lower values
677
are better; the best of the two is in bold.
678
The fourth column is the elapsed time when running the original code compiled with \emph{gcc -O3},
679
using GCC 4.4.4. The last column is the ratio, that is, how much slower the JavaScript code
680
(running in the faster of the two engines for that test) is
681
when compared to gcc. All the tests were run on a MacBook Pro with
682
an Intel i7 CPU clocked at 2.66GHz, running on Ubuntu 10.04.
683

684
Clearly the results greatly vary by the benchmark, with the generated JavaScript running from 2.47 to 8.46 times
685
slower. There are also significant differences between the two JavaScript engines, with each better
686
at some of the benchmarks.
687
It appears that code that does simple numerical operations -- like
688
the primes test -- can run fairly fast, while code that has a lot of memory
689
accesses, for example due to using structures -- like the raytrace test --
690
will be slower. (The main issue with structures is that Emscripten does not
691
`nativize' them yet, as it does to simple local variables.)
692

693
Being 2.47 to 8.46 times slower than the most-optimized C++ code
694
is a significant slowdown, but it is still more than fast enough for
695
many purposes, and the main point of course is that the code can run
696
anywhere the web can be accessed. Further work on Emscripten is expected to
697
improve the speed as well, as are improvements to LLVM, the Closure
698
Compiler, and JavaScript engines themselves; see further discussion
699
in the Summary.
700

701
\subsection{Limitations}
702

703
Emscripten's compilation approach, as has been described in this Section so far,
704
is to generate `natural' JavaScript, as close as possible to normal JavaScript
705
on the web, so that modern JavaScript engines perform well on it. In particular,
706
we try to generate `normal' JavaScript operations, like regular addition and
707
multiplication and so forth. This is a very
708
different approach than, say, emulating a CPU on a low level, or for the case
709
of LLVM, writing an LLVM bitcode interpreter in JavaScript. The latter approach
710
has the benefit of being able to run virtually any compiled code, at the cost
711
of speed, whereas Emscripten makes a tradeoff in the other direction. We will
712
now give a summary of some of the limitations of Emscripten's approach.
713

714
\begin{itemize}
715
\item \textbf{64-bit Integers}: JavaScript numbers are all 64-bit doubles, with engines
716
      typically implementing them as 32-bit integers where possible for speed.
717
      A consequence of this is that it is impossible to directly implement
718
      64-bit integers in JavaScript, as integer values larger than 32 bits will become doubles,
719
      with only 53 bits for the significand. Thus, when Emscripten uses normal
720
      JavaScript addition and so forth for 64-bit integers, it runs the risk of
721
      rounding effects. This could be solved by emulating 64-bit integers,
722
      but it would be much slower than native code.
723
\item \textbf{Multithreading}: JavaScript has Web Workers, which are additional
724
      threads (or processes) that communicate via message passing. There is no
725
      shared state in this model, which means that it is not directly possible
726
      to compile multithreaded code in C++ into JavaScript. A partial solution
727
      could be to emulate threads, without Workers, by manually controlling
728
      which blocks of code run (a variation on the switch in a loop construction
729
      mentioned earlier) and manually switching between threads every so often.
730
      However, in that case there would not be any utilization
731
      of additional CPU cores, and furthermore performance would be slow due to not
732
      using normal JavaScript loops.
733
\end{itemize}
734

735
After seeing these limitations, it is worth noting that some advanced LLVM instructions turn out to be
736
surprisingly easy to implement. For example, C++ exceptions are represented in
737
LLVM by \emph{invoke} and \emph{unwind}, where \emph{invoke} is a call to a function that will
738
potentially trigger an \emph{unwind}, and \emph{unwind} returns to the earliest invoke.
739
If one were to implement those
740
in a typical compiler, doing so would require careful work. In Emscripten, however,
741
it is possible to do so using JavaScript exceptions in a straightforward manner:
742
\emph{invoke} becomes a function call wrapped in a \emph{try} block, and \emph{unwind}
743
becomes \emph{throw}. This is a case where compiling to a high-level language turns
744
out to be quite convenient.
745

746
\section{Emscripten's Architecture}
747
\label{sec:emarch}
748

749
In the previous section we saw a general overview of Emscripten's approach
750
to compiling LLVM assembly code into JavaScript. We will now get into more detail
751
into how Emscripten itself is implemented.
752

753
Emscripten is written in JavaScript. The primary reason for that decision
754
was convenience: Two simple examples of the benefits of that approach are that (1)
755
the compiler can create JavaScript objects that represent constant structures from the original
756
assembly code, and convert them to a string using JSON.stringify()
757
in a trivial manner,
758
and (2) the compiler can simplify numerical operations by simply
759
eval()ing the code (so ``1+2'' would become ``3'', etc.). In both examples,
760
the development of Emscripten was made simpler by having the exact same environment
761
during compilation as the executing code will have. This also helps in more
762
complex ways, for example when the same code needs to be run at compile time
763
and at runtime, and makes various dynamic compilation techniques possible in the future.
764

765
Emscripten's compilation has three main phases:
766
\begin{itemize}
767
\item The \textbf{intertyper}, which converts from LLVM assembly into
768
      Emscripten's internal representation.
769
\item The \textbf{analyzer}, which inspects the internal representation
770
      and generates various useful information for the final phase,
771
      including type and variable information, stack usage analysis,
772
      optional data for optimizations
773
      (variable nativization and relooping), etc.
774
\item The \textbf{jsifier}, which does the final conversion of the
775
      internal representation plus additional analyzed data into JavaScript.
776
\end{itemize}
777

778
\subsection{The Runtime Environment}
779
\label{sec:runtime}
780

781
Code generated from Emscripten is meant to run in a JavaScript engine,
782
typically in a web browser. This has implications for the kind of
783
runtime environment we can generate for it, for example, there is no
784
direct access to the local filesystem.
785

786
Emscripten comes with a partial implementation of a C library,
787
mostly written from scratch in JavaScript, with parts compiled from an
788
existing C library\footnote{newlib, \url{http://sourceware.org/newlib/}}. Some aspects of the runtime environment, as
789
implemented in that C library, are:
790
\begin{itemize}
791
\item An emulated filesystem is available, with files stored in memory.
792
\item Emscripten allows writing pixel data to an HTML5 canvas element,
793
      using a subset of the SDL API. That is, one can write an application in C or C++ using
794
      SDL, and that same application can be compiled normally and run
795
      locally, or compiled using Emscripten and run on the web. See, for
796
      example, Emscripten's raytracing demo at \url{http://syntensity.com/static/raytrace.html}.
797
\item \emph{sbrk()} is implemented using the \emph{HEAP} array which
798
      was mentioned previously. This allows a normal \emph{malloc()}
799
      implementation written in C to be compiled to JavaScript.
800
\end{itemize}
801

802
\subsection{The Relooper: Recreating high-level loop structures}
803
\label{sec:relooper}
804

805
The Relooper the most complex module in Emscripten. It receives
806
a `soup of blocks', which is a set of labeled fragments of code, each
807
ending with a branch operation, and the goal is to generate normal
808
high-level JavaScript code flow structures such as loops and ifs.
809
Generating such code structures is essential to producing good-performing code,
810
since JavaScript engines are tuned to run such code very quickly (for
811
example, a tracing JIT as in SpiderMonkey will only trace normal loops).
812

813
Returning to the LLVM assembly code on page~\pageref{code:examplellvm}, it
814
has the following structure (where arrows denote potential paths of execution):
815

816
\includegraphics[width=80mm]{graph.png}
817

818
In this simple example, it is fairly straightforward to see that a natural way to implement it
819
using normal loop structures is
820
\newpage
821
\begin{verbatim}
822
ENTRY
823
while (true) do
824
  2
825
  if (condition) break
826
  5
827
  9
828
12
829
\end{verbatim}
830
In general though, this is not always easy or even practical -- there may
831
not be a straightforward high-level loop structure corresponding to the low-level one, if
832
for example the original C code relied heavily on \emph{goto} instructions.
833
In practice, however, almost all real-world C and C++ code tends to
834
be amenable to loop recreation.
835

836
We now begin to describe the Relooper algorithm. As mentioned before, it takes as input a `soup of labeled LLVM blocks' as described above,
837
and generates a structured set of Emscripten code blocks, which are each a set of LLVM blocks
838
with some logical structure. For simplicity we call LLVM blocks `labels' and Emscripten
839
blocks `blocks' in the following.
840

841
There are three types of Emscripten blocks:
842
\begin{itemize}
843
\item \textbf{Simple block}: A block with
844
  \begin{itemize}
845
  \item One \textbf{Internal} label, and
846
  \item a \textbf{Next}
847
      block, which the internal label branches to. The block is later
848
      translated simply into the code for that label, and the Next
849
      block appears right after it.
850
  \end{itemize}
851
\item \textbf{Loop}: A block that represents a basic loop, comprised of
852
      two internal sub-blocks:
853
  \begin{itemize}
854
  \item \textbf{Inner}: A block that will appear inside
855
        the loop, i.e., when execution reaches the end of that block,
856
        flow will return to the beginning. Typically a loop will contain
857
        a conditional \emph{break} defining where it is exited. When we
858
        exit, we reach the Next block, below.
859
  \item \textbf{Next}: A block that will appear just outside
860
        the loop, in other words, that will be reached when the loop is exited.
861
  \end{itemize}
862
\item \textbf{Multiple}: A block that represents a divergence into several
863
      possible branches, that eventually rejoin. A Multiple block can
864
      implement an `if', an `if-else', a `switch', etc. It is comprised of:
865
  \begin{itemize}
866
  \item \textbf{Handled blocks}: A set of blocks to which execution can
867
        enter. When we reach the multiple block, we check which of them
868
        should execute, and go there. When execution of that block is
869
        complete, or if none of the handled blocks was selected for
870
        execution, we proceed to the Next block, below.
871
  \item \textbf{Next}: A block that will appear just after the Handled blocks,
872
        in other words, that will be reached after code flow
873
        exits the Handled blocks.
874
  \end{itemize}
875
\end{itemize}
876

877
To clarify these definitions, the example LLVM assembly code we have
878
been working with could be represented in a natural way as
879
\begin{verbatim}
880
Simple
881
  entry
882
  Loop
883
    Simple
884
      2
885
      Simple
886
        5
887
        Simple
888
          9
889
          null
890
    Simple
891
      12
892
      null
893
\end{verbatim}
894
where the first indented line in a Simple block is the Internal label in that Simple block, 
895
the second indented line is its Next block, and so forth.
896

897
Continuing to describe the Relooper algorithm, we will use the term `entry' to
898
mean a label that can be reached immediately in a block. In other
899
words, a block consists of labels $l_1,..,l_n$, and the entries
900
are a subset of those labels, specifically the ones that execution
901
can directly reach when we reach that block. With that
902
definition, the Relooper algorithm can
903
then be written as follows:
904

905
\begin{itemize}
906
\item \textbf{Receive a set of labels and which of them are entry points.}
907
      We wish to create a block comprised of all those labels.
908
\item \textbf{Calculate, for each label, which other labels it \emph{can}
909
      reach}, i.e., which labels we are able to reach if we start
910
      at the current label and follow one of the possible paths
911
      of execution.
912
\item \textbf{If we have a single entry, and cannot return to it (by some other
913
      label later on branching to it) then create a Simple block}, with the entry
914
      as its internal label, and the Next block comprised of all
915
      the other labels. The entries for the Next block are the entries
916
      to which the internal label can branch.
917
\item \textbf{If we can return to all of the entries, create a
918
      Loop block}, whose Inner block is comprised of all labels that
919
      can reach one of the entries, and whose Next block is
920
      comprised of all the others. The entry labels for the current
921
      block become entry labels for the Inner block (note that
922
      they must be in the Inner block by definition, as each one can reach
923
      itself). The Next block's entry labels are all the labels
924
      in the Next block that can be reached by the Inner block.
925
\item \textbf{If we have more than one entry, try to create a Multiple block}: For each entry, find all
926
      the labels it reaches that cannot be reached by any other
927
      entry. If at least one entry has such labels, return a
928
      Multiple block, whose Handled blocks are blocks for those
929
      labels (and whose entries are those labels), and whose Next block is all the rest.
930
      Entries for the next block are entries that did not become part of the Handled
931
      blocks, and also labels that can be reached from the Handled blocks.
932
\item \textbf{If we could not create a Multiple block, then create a Loop block as described above}
933
      (see proof below of why creating a Loop block is possible, i.e., why the labels contain a loop).
934
\end{itemize}
935
Note that we first create a Loop only if we must, then try to create a
936
Multiple, then create a Loop if we have no other choice. We could have slightly simplified this in
937
various ways, but the algorithm as presented above has given overall better
938
results in practice, in terms of the `niceness' of the shape of the
939
generated code, both subjectively and at least in some simple benchmarks.
940

941
Additional details of the algorithm include
942
\begin{itemize}
943
\item The technical mechanism by which execution flow is controlled in the generated code involves
944
the \emph{\_\_label\_\_} variable, mentioned earlier. Whenever we enter a block with more than one
945
entry, we set \emph{\_\_label\_\_} before we branch into it, and we
946
check its value when we enter that block. So, for example, when we
947
create a Loop block, its Next block can have multiple entries -- any
948
label to which we branch out from the loop. By creating a Multiple
949
block after the loop, we can enter the proper label when the loop is
950
exited. (Having a \emph{\_\_label\_\_} variable does add some overhead,
951
but it greatly simplifies the problem that the Relooper needs to solve
952
and allows us to only need three kinds of blocks as described above.
953
Of course, it is possible to optimize
954
away writes and reads to \emph{\_\_label\_\_} in many or even most cases.)
955
\item As the Relooper processes labels, it replaces branch
956
instructions accordingly. For example, when we create a Loop
957
block, then all branch instructions to the outside of the loop are
958
converted into \emph{break} commands (since a break instruction in JavaScript
959
will indeed get us to outside of the loop), and all branch
960
instructions to the beginning of the loop are converted into
961
\emph{continue} commands, etc. Those commands are then
962
ignored when called recursively to generate the Inner block (that is,
963
the \emph{break} and \emph{continue}
964
commands are guaranteed, by the semantics of JavaScript, to get us to
965
where we need to go -- they do not need any further work
966
for them to work properly).
967
\item Emscripten also does an additional pass after what has been
968
described thus far, which was the first pass. The first pass is guaranteed to produce valid output (see below), while
969
the second pass takes that valid output and optimizes it, by
970
making minor changes such as removing
971
\emph{continue} commands that occur at the very end of loops
972
(where they are not needed), etc.
973
\end{itemize}
974

975
We now turn to an analysis of the Relooper algorithm. It is straightforward to see that the output of the algorithm,
976
assuming it completes successfully -- that is, that if finishes in finite time, and does
977
not run into an error in the last part (where it is claimed that
978
if we reach it we can return to at least one of the entry labels) --
979
is correct in the sense of code execution being carried out
980
as in the original data. We will now prove that the algorithm must
981
in fact complete successfully.
982

983
First, note that if we
984
successfully create a block, then we simplify the remaining
985
problem, where the `complexity' of the problem for our purposes
986
here is the sum of labels plus the sum of branching operations:
987
\begin{itemize}
988
\item This is trivial for Simple blocks (since we now have a Next block
989
which is strictly smaller).
990
\item It is true for Loop blocks simply by removing branching
991
operations (there must be a branching back to an entry, which
992
becomes a \emph{continue}).
993
\item For Multiple blocks, if the Next block is non-empty then we have split into strictly
994
smaller blocks (in number of labels) than before. If the next block
995
is empty, then since we built the Multiple block from a set of labels
996
with more than one entry, then the Handled blocks are strictly smaller
997
than the current one.
998
\end{itemize}
999
We have seen that whenever we successfully create a block, we simplify the remaining problem
1000
as defined above, which means that we must eventually halt successfully (since
1001
we strictly decrease a nonnegative integer).
1002
The remaining issue is whether we can reach a situation where we \emph{cannot}
1003
successfully create a block, which is if we reach the final part of the relooper algorithm, but cannot create a
1004
Loop block there. For that to occur, we must not be able
1005
to return to any of the entries (or else we would create a Loop
1006
block). Assume that indeed we cannot return to any of the entries. But if that is so, we can create a Multiple
1007
block with Handled blocks that each include one of the entries (and possibly additional labels as well), since each entry
1008
label cannot be reached from any other label by our assumption earlier, thus
1009
contradicting that assumption
1010
and concluding the proof.
1011

1012
(We have not, of course, proven that the shape of the blocks is optimal
1013
in any sense. However, even if it is possible to optimize them further, the Relooper
1014
already gives a very substantial speedup due to the move from the switch-in-a-loop
1015
construction to more natural JavaScript code flow structures.)
1016

1017
\section{Example Uses}
1018
\label{sec:examples}
1019

1020
Emscripten has been run successfully on several real-world codebases. We present
1021
some examples here to give an idea of the various opportunities made possible
1022
by Emscripten.
1023
\begin{itemize}
1024
\item \textbf{Python}: It is possible to run variants of Python on
1025
the web in various ways, including Pyjamas, IronPython on SilverLight and
1026
Jython in Java. However, all of these are slightly nonstandard in the
1027
Python code they run, while the latter two also require plugins to be
1028
installed. With Emscripten, on the other hand, it is possible to compile
1029
CPython itself -- the standard, reference implementation of Python -- and
1030
run that on the web, which allows running standard Python code. An online
1031
demo is available at \url{http://syntensity.com/static/python.html}.
1032
(Another example of a language runtime that Emscripten can convert to run
1033
on the web is Lua; an online demo is available at \url{http://syntensity.com/static/lua.html}.)
1034
\item \textbf{Poppler and FreeType}: Poppler\footnote{\url{http://poppler.freedesktop.org/}} is an open source PDF
1035
rendering library. In combination with FreeType\footnote{\url{http://www.freetype.org/}}, an open source font
1036
engine, it can be used to render PDF files. By compiling it with Emscripten,
1037
PDF files can be viewed on the web, without the use of plugins or external
1038
applications. An online demo is available at \url{http://syntensity.com/static/poppler.html}
1039
\item \textbf{Bullet}: The Bullet Physics library\footnote{\url{http://bulletphysics.org/wordpress/}} is
1040
an open source physics engine, used in many open source and proprietary applications. An online
1041
demo is available at \url{http://syntensity.com/static/bullet.html}, showing a physics
1042
simulation of falling blocks that uses Bullet compiled to JavaScript. Bullet has in the
1043
past been ported to JavaScript\footnote{\url{http://pl4n3.blogspot.com/2010/07/bulletjs-javascript-physics-engine.html}}, by porting JBullet (a port of Bullet to Java). The main difference in the approaches is that with Emscripten, there is no need for
1044
time-consuming manual conversion of C++ to Java and then to JavaScript, and consequently,
1045
the latest Bullet code can be run in JavaScript and not an earlier version (JBullet lags
1046
several versions behind the latest Bullet release).
1047
\end{itemize}
1048

1049
\section{Summary}
1050
\label{sec:summary}
1051

1052
We presented Emscripten, an LLVM-to-JavaScript compiler, which opens up
1053
numerous opportunities for running code written in languages other
1054
than JavaScript on the web, including some not previously possible.
1055
Emscripten can be used to, among other
1056
things, compile real-world C and C++ code and run that on the web. In
1057
addition, by compiling the runtimes of languages which are implemented in C and C++,
1058
we can run them on the web as well, for example Python and Lua.
1059

1060
Perhaps the largest future goal of Emscripten is to improve the performance of
1061
the generated code. As we have seen, speeds of around $1/10$th that of
1062
GCC are possible, which is already good enough for many purposes, but
1063
can be improved much more. The code Emscripten generates will become faster
1064
`for free' as JavaScript engines get
1065
faster, and also by improvements in the optimizations done by LLVM and the Closure
1066
Compiler. However there is also a lot of room for additional optimizations in
1067
Emscripten itself, in particular in how it nativizes variables and structures,
1068
which can potentially lead to very significant speedups.
1069

1070
When we compile a language's entire runtime into JavaScript, as mentioned
1071
before, there is another way to improve performance.
1072
Assume that we are compiling a C or C++ runtime of a language
1073
into JavaScript, and that that runtime uses JIT compilation to generate machine code. Typically
1074
code generators for JITs are written for the main CPU architectures, today
1075
x86, x86\_64 and ARM. However, it would be possible for a JIT to
1076
generate JavaScript instead. Thus, the runtime would be compiled using
1077
Emscripten, and at runtime it would pass the JIT-generated JavaScript to
1078
\emph{eval}. In this
1079
scenario, JavaScript is used as a low-level intermediate representation in
1080
the runtime, and the final conversion to machine code is left to the underlying
1081
JavaScript engine. This approach can potentially allow languages that 
1082
greatly benefit from a JIT (such as Java, Lua, etc.) to be run on the web
1083
efficiently.
1084

1085
Getting back to the issue of high-performing code in general, it is worth comparing
1086
Emscripten to Portable Native Client (\cite{pnacl},~\cite{nacl}), a project in development which aims
1087
to allow an LLVM-like format to be distributed and run securely
1088
on the web, with speed comparable to native code.
1089

1090
Both Emscripten
1091
and PNaCl aim to allow code written in languages like C and C++ to
1092
be run on the web, but in very different ways: Emscripten compiles code into JavaScript, and
1093
PNaCl compiles into an LLVM-like format which is then
1094
run in a special PNaCl runtime. As a consequence, Emscripten's generated code can run on all web browsers,
1095
since it is standard JavaScript, while PNaCl's generated code
1096
requires the PNaCl runtime to be installed; another major
1097
difference is that JavaScript engines do not yet run code at
1098
near-native speeds, while PNaCl does. In a broad summary, Emscripten's
1099
approach allows the code to be run in more places, while PNaCl's
1100
allows the code to run faster.
1101

1102
However, as mentioned earlier, improvements in JavaScript engines and compiler
1103
technology may narrow the speed
1104
gap. Also, when considering the speed of JavaScript engines, for purposes of Emscripten we do not need to
1105
care about \emph{all} JavaScript, but only the kind generated by
1106
Emscripten. Such code is \textbf{implicitly statically typed}, that is,
1107
types are not mixed, despite JavaScript in general allowing assigning, e.g., an
1108
integer to a variable and later a floating point value or even an object to that same variable. Implicitly statically
1109
typed code can be statically analyzed and converted into
1110
machine code that has no runtime type checks at all. While such
1111
static analysis can be time-consuming, there are practical ways for
1112
achieving similar results quickly, such as tracing and type inference, which
1113
would help on such code very significantly, and are already in use
1114
or being worked on in mainstream JavaScript engines (e.g., SpiderMonkey). As
1115
a consequence, it may soon be possible to run code written in languages such as
1116
C and C++ on the web with near-native speed.
1117

1118
%\appendix
1119
%\section{Appendix Title}
1120
%This is the text of the appendix, if you need one.
1121

1122
\acks
1123

1124
We thank the following people for their contributions to Emscripten: David LaPalomento, Daniel Heres, Brian Crowder, Brian McKenna, dglead and tuba.
1125

1126
\bibliographystyle{abbrvnat}
1127

1128
% The bibliography should be embedded for final submission.
1129

1130
\begin{thebibliography}{}
1131
\softraggedright
1132

1133
\bibitem[Ashkenas. (2011)]{ashkenas} J. Ashkenas.
1134
List of languages that compile into JavaScript. Available at \url{https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS}.
1135
Retrieved April 2011.
1136

1137
\bibitem[Cifuentes et~al. (1998)]{Cifuentes98assemblyto} C. Cifuentes, D. Simon and A. Fraboulet.
1138
Assembly to High-Level Language Translation. In Int. Conf. on Softw. Maint, pp. 228--237, IEEE-CS Press, 1998.
1139

1140
\bibitem[Cooper et~al. (2006)]{links} E. Cooper, S. Lindley, P. Wadler and J. Yallop.
1141
Links: Web programming without tiers. In 5th International Symposium on Formal Methods for Components and Objects (FMCO), 2006.
1142

1143
\bibitem[Donovan et~al. (2010)]{pnacl} A. Donovan, R. Muth, B. Chen and D. Sehr.
1144
PNaCl: Portable Native Client Executables. Available at
1145
\url{http://nativeclient.googlecode.com/svn/data/site/pnacl.pdf}. Retrieved April 2011.
1146

1147
\bibitem[Flanagan (2006)]{js} D. Flanagan.
1148
JavaScript: The Definitive Guide. O'Reilly Media, 2006.
1149

1150
\bibitem[Loitsch et~al. (2008)]{hop} F. Loitsch and M. Serrano.
1151
Hop Client-Side Compilation. In Trends in Functional Programming, vol. 8, pp. 141--158, Seton Hall University, Intellect Bristol, 2008.
1152

1153
\bibitem[Petříček et~al (2011)]{afax} T. Petříček and D. Syme.
1154
AFAX: Rich client/server web applications in F\#. Draft. Available at \url{http://tomasp.net/academic/fswebtools.aspx}. Retrieved April 2011.
1155

1156
\bibitem[Prabhakar (2007)]{gwt} C. Prabhakar.
1157
Google Web Toolkit: GWT Java Ajax Programming. Packt Publishing, 2007.
1158

1159
\bibitem[Proebsting et~al. (1997)]{pro97} T.A. Proebsting and S. A. Watterson.
1160
Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?) In Third USENIX Conference on Object-Oriented Technologies and Systems (COOTS), 1997.
1161

1162
\bibitem[Yee et~al. (2009)]{nacl} B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S.
1163
Okasaka, N. Narula, and N. Fullagar. Native Client: A Sandbox for
1164
Portable, Untrusted x86 Native Code. In IEEE Symposium on
1165
Security and Privacy, May 2009.
1166

1167
\end{thebibliography}
1168

1169
\end{document}
1170

1171

1172
Product

Resources

Company