aboutsummaryrefslogtreecommitdiffstats
path: root/posts/2016/juliacon.md
blob: 52751a9129e4ee795a62fe1be866120efe52c9c7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
Title: What I Learned At JuliaCon
Author: bnewbold
Date: 2016-07-12
Tags: tech, recurse, julia

*Note: It looks like videos of the JuliaCon talks were uploaded [to
Youtube][youtube] the day this post was finally published!*

[youtube]: https://www.youtube.com/playlist?list=PLP8iPy9hna6SQPwZUDtAM59-wPzCPyD_S

I was in Cambridge, MA for a few days the other week at [JuliaCon][], a small
conference for the Julia programming language. Julia is a young language
(started around 2014 and currently pre-1.0) oriented towards fast numerical
computation: matrix manipulation, simulation, optimization, signal analysis,
etc. I've done a fair amount of such programming over the years, and it has
never felt as elegant or coherent as it could be. The available tools and
languages are generally either:

[JuliaCon]: http://juliacon.org

<div class="sidebar">
<img src="/static/fig/julia_logo.png" width="180px" alt="julia logo" />
</div>

1. stuck in the 1980s in terms of programming language features for safety,
   productivity, and collaboration (eg, Fortran and Matlab)
1. expensive proprietary closed-source packages (eg, Matlab and Mathematica)
1. general-purpose languages with numerical features either hacked on or in the
   form of libraries (eg, Python)

There is a lot to be excited about in Julia. It's already pretty fast
(leveraging pre-existing JIT tools, hand-tuned matrix and solver libraries, and
the LLVM compiler suite) and has contemporary high-level language features
(like optional type annotation, polymorphic function dispatch, package
management tools, and general systems tools (eg, JSON and HTTP support)) that
can make the language more faster to develop in, and easier to read and
maintain. I'm personally excited about the progeny of the language: the
birthplace of the language is the CSAIL building at MIT, and the spirit of
[Scheme][sicm] and the work of [Project MAC][] is sprinkled through the
project. One of the [big pitches](graydon2) of Julia is that scientists won't
need to learn both a productive high-level language (eg, Python) and a
low-level performant language (eg, C or Fortran) and interface between the two:
Julia has everything all in one place.

[graydon2]: http://graydon2.dreamwidth.org/3186.html
[sicm]: https://mitpress.mit.edu/sites/default/files/titles/content/sicm/book.html
[Project MAC]: http://groups.csail.mit.edu/mac/projects/mac/

All that being said, while I thought I would be working in Julia a lot during
my time at the Recurse Center, I've ended up being much more drawn to the
[Rust][] language instead. Rust is a general systems language (it's compiled,
has stronger typing, and no garbage collection), and not great for interactive
numerical exploration, but I've found it a joy to program in: for the most part
everything *just works* the way it says it will. My recent experience with
Julia, on the other hand, has been a lot of breakage between library and
interpreter versions, poor developer usability (eg, hard to figure out where
files should live in a package), and very frustrating import/load times. Though
I have to admit that I while I pushed through some frustrations with Rust, I
haven't spent *that* much time with Julia, and may have just been impatient, so
take everything I say here with a grain of salt.

With these feels going in, what did I learn at JuliaCon and what do I think of
the future of the language now? In the below sections I'll go over the
interesting things I saw, then come back to summary at [the end](#summary).

### Programming Language Design

An older research language for numerical computing that I have always been
curious about is Fortress, and the leader of that project (Guy Steele, who also
worked on the design of the Scheme and Java languages) gave one of the opening
keynote speeches at JuliaCon this year.  Awesome! I get really excited about
inter-generational learning and dialog.

Fortress was a very "mathy" language. The number tower was intended to be
"correct" (aka, have the same structure that mathematicians use), physical
units were built-in, and some operator precedence was non-transitive. Operators
on built-in types (like Integers) could be overloaded, unlike in Java, because
Fortress users could apparently be trusted to "preserve algebraic properties".
Steele is a proponent of using whitespace (or lack of whitespace) to clarify
expressions, sort of like extra parentheses, and enforcing this in the
compiler. For example, the following two statements would be equivalent in most
languages, but not in Fortress:

```
a + b*c + d     // Clear: Ok
a+b * c+d       // Misleading: Compiler Error
```

This was part of a general effort to allow "whiteboard" style syntax in the
language. Fortress code actually has two representations: a plain text
Scala-style source code, and a LaTeX-y symbolic math format. Steele also used
some font-coloring in his slides to differentiate different types of symbols,
which reminded me of the helpful style my undergraduate physics professors
would use on the blackboard. I think this effort to adapt the "look and feel"
of the language to how the intended audience already writes and communicates is
really cool. I wonder if a third syntax format could have been added in a
one-to-one manner: that of a general purpose language like Scala or Haskell
(both noted as influences to Fortress) to make collaboration with general
purpose programming experts easier. Steele mentioned that some efforts to make
the syntax more math-like resulted in "contortions", so there is probably more
work to be done here.

In my limited experience, Julia has a pretty clean syntax, and allows some
math-y [unicode characters as operators][unicode_ops] (like ∈, ≠, etc), but
didn't prioritize math-y syntax as much as Fortress. Given the open challenges
with formalizing informal whiteboard syntax this may or may not have been a
missed opportunity.

[unicode_ops]: http://docs.julialang.org/en/release-0.4/manual/unicode-input/

The positive lessons learned from Fortress were summarized as being the type
system, automatic parallelism (via generators and reducers), the math-y syntax,
pretty printing (I assume meaning the LaTeX-y representation), physical units,
and forced syntax clarity (aka, forced use of parentheses and whitespace). One
issue that come up during implementation was that it was hard to bound the
latency and computational complexity of type constraint solving at run-time.

A few other talks touched on language design decisions and features. There was
a short "Functional HPC" talk by Erik Schnetter, in which it was pointed out
that for some workloads regular old garbage collection can be faster than
reference counting: I've become used to thinking of latency and GC pauses as a
huge performance problem in systems programming, but for number crunching that
isn't as much of an issue, while little reference overheads are (especially if
locks or atomic operations are necessary).

Keno Fischer gave an overview of the [Gallium][] debugger, which had some cool
features, but is still under development. There are both AST-based and
LLVM-based backends for the debugger, which allows stepping at function calls,
line-by-line, or expression-by-expression, which is something I hadn't seen
before. He demoed stepping through each step of the creation of a matplotlib
graph, with the output shown graphically after each step. Neat stuff!

One of my personal interests in Julia would be formalizing the syntax into a
machine-readable grammar (eg, [EBNF][] or [ABNF][]). I was lucky enough to run
in to Stefan Karpinski during one of the coffee breaks, and he pointed me to
the Julia plugin for Eclipse, which already has a partial implementation of a
grammar.

A few talks touched on the issue of Nullable datatypes (also called "Maybe" or
"Option" types in other languages), particularly for data science and
DataFrame-type applications. I only recently encountered [Option][] (and the
related [Result][] type) datatypes, in Rust, and can see why people want these
so badly, but there doesn't seem to be a simple path forward yet. Rust really
leverages these types in function return signatures, a feature which Julia does
not have for now; I think I read rumors about them being added in the future,
but didn't hear any mention of them here or on the 1.0 feature roadmap.

[Option]: https://doc.rust-lang.org/std/option/index.html
[Result]: https://doc.rust-lang.org/core/result/index.html
[Gallium]: http://juliacon.org/abstracts.html#Gallium
[EBNF]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
[ABNF]: https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form

### Numeric Abstraction

One of the big trends I saw was taking advantage of Julia's abstractions around
generic operators and arrays to experiment with novel computation strategies.
Sometimes this means improving precision (with novel data types and
representations), sometimes it means increasing performance (by changing memory
layout or distribution, or targeting special hardware), and sometimes it just
makes code more elegant or semantic.

For example, Tim Holy gave a talk (titled "To the Curious Incident of the CPU
in the run-time") which covered a bunch of nitty-gritty details for
implementing wrapper classes that re-shape or re-size Arrays, including sparse
arrays.

Lindsey Kuper gave a nice overview of the [ParallelAccelerator.jl][pajl]
project, which entirely re-compiles Julia into C++ to get some extra performance
from the static full-program compiler. It seems to me that this only makes
sense because the Julia language has clean abstractions that the transpiler can
take advantage of.

[pajl]: http://juliacon.org/abstracts.html#ParallelAccelerator

One of my favorite talks from the whole conference was David Sanders' and Luis
Benet's talk on ValidatedNumerics ("Precise and rigorous calculations for
dynamical systems"). Instead of computing on approximate (rounded) scalars,
they compute on intervals of floating point numbers (or in higher dimensions,
boxes): at the end of computation the "correct" solution is known to be within
the final box, which also gives context as to how much numerical error has
accumulated. By defining new *types* to accomplish this (specifically,
DualNumbers), they can re-use any generic code in a relatively performant
manner. They also noted that when there is an analytic form to bound the error
for all following terms, Taylor expansion approximations can be truncated as
soon as the interval error exceeds the error in all following terms. Cool!

### Other Fun Stuff

**[Using Julia as a Quick and Dirty Code Generator][10]:**
The speaker (Arch Robison) is clearly having way too much fun! He used Julia to
output assembly code to get fast (real-time) discrete Fourier transform (DFT)
performance for a little video game called "FreqonInvaders". Infectious
enthusiasm!

**[Autonomous driving for RC cars with ROS and Julia][11]:**
A fun little project doing "Model Predictive Control" on a small model car to
do stunts like drifting and slide parking into a tiny space. They achieved
about a 10Hz closed-loop control latency, which seems to me like barely enough
for this sort of thing, but clearly worked alright. Everything ran on the car
itself (no computation on a remote desktop with wireless control or anything
like that), with an Odroid ARM Linux system and an Arduino-compatible
microcontroller; Julia code using JuMP and other optimization stuff ran on the
ARM system. The code and raw data (for analysis) is available on the [BARC
project website](http://www.barc-project.com). Super cool, having this stuff
being experimented with already means there will be pressure to improve
soft-real-time performance in the language itself.

**[Astrodynamics.jl: Modern Spaceflight Dynamics in Julia][12]:**
Mostly a bunch of code for doing timebase conversions and interpreting (or
calculating) ephemeris data (which is information about where astro bodies like
the Moon and planets will be at a given time), but some simple demos of orbital
simulation and event detection (eg, perihelion time and position) as well. Would
be cool if the ValidatedNumerics stuff was integrated.

**[GLVisualize][13]:**
The demos in this talk were really impressive: live editing of mesh vertices,
relatively high performance, real-time feedback, etc. There were a bunch of
good graphics talks: the [GR Framework][14] stuff is really impressive in scope
(though maybe not as big a performance boost over Python as hoped), and
[Vulkan][15] is exciting.

[10]: http://juliacon.org/abstracts.html#FrequonInvaders
[11]: http://juliacon.org/abstracts.html#RaceCars
[12]: http://juliacon.org/abstracts.html#Astrodynamics
[13]: http://juliacon.org/abstracts.html#GLVisualize
[14]: http://juliacon.org/abstracts.html#GR
[15]: http://juliacon.org/abstracts.html#Vulkan

### Diversity

It's sad to say, but the gender diversity at the conference was really poor,
particularly in contrast to the Recurse Center (where I have spent the past
couple months). The women I did meet gave some of the best talks, are crucial
contributors to infrastructure, and are generally amazing: more please! Aside
from the principle of the thing, there is just something about a giant sea of
guys at a tech event that results in a tense group vibe. Everybody I spoke to
one-on-one was friendly and we had great conversations, but as a group there
was a lot of ice to be broken. In my experience even hitting 10-20% women in
attendance can thaw this out, but that's just my anecdotal experience.

I haven't attended, but I hear that PyCon has done a great job improving
diversity with careful planning and [systemic initiatives][pycon-diversity].

Overall, I thought the conference was a great group of people and admirably
well run. I appreciated the efforts to keep costs low, and everything generally
ran on time. Thanks to all the volunteer and MIT staff organizers for their
efforts!

[pycon-diversity]: https://us.pycon.org/2016/about/diversity/

### Julia 1.0

Stefan Karpinski gave an overview of features and roadmap for getting to Julia
1.0, which I think was a topic close to most attendee's hearts (including
mine). I ended up with a huge list of written notes, which I'll summarize
below; the punchline was aiming to have a 1.0 release around one year from now.
Apparently the one-year goal has been floated in previous years; I'm not sure
how wise it is in general to float initial release timelines for a project like
this, it seems like it will just "be done when it's done".

Some of the goals that were interesting to me:

- Arrays: might refactor Arrays to have a separate backing abstraction of "Buffers"
  with arrays on top (apparently Lua and Torch do this).
- Strings: move full Unicode support out of core language (Base) and into a
  package. The `@printf` macro will be refactored into a function. To my
  surprise, currently Strings are implemented as an Array! This has a
  relatively large overhead for each string (72 bytes).
- Modularity and Package infrastructure: currently a mess (I agree), `import`,
 `using` and `export` will be refactored.
- Compiler: add non-pthreads multithreading; better static compilation; ability
  to define a `main()` function and get a standalone script or binary; ability
  to redefine functions and have the changes propagate (cache invalidation
  problem); stabilize intermediate representations. Seems like a lot!
- Optimizations: faster garbage collection, more auto-vectorization (eg, for
  vector floating point units), improve globals performance. Might pull in part
  of ParallelAccelerator?

I'm a little nervous how many of these goals are big open questions instead of
just implementation tasks. I wish there was a more healthy way to experiment
with new features and refactoring without breaking everything or committing to a
long-term stable API; I think other languages have settled into good patterns
for this kind of development, though maybe they needed to go through a
difficult 1.0 process first. It was mentioned that 0.6 would be the last of the
0.x series of releases and considered 1.0-alpha, and that from 1.x and on
things should generally be backwards compatible.

Separate from Stefan's talk, there was a short overview of progress on the next
iteration of the Julia package and dependency manger, called Pkg3. The goals
were described as "a mash-up of virtualenv and cargo": virtualenv is a tool for
isolating per-application dependencies and toolchains in Python, and Cargo is
is the Rust dependency manager and build tool (which is also used in a
per-application fashion). Pkg3 sounds like it will have a concept of distinct
"global" (meaning system-wide?) installations and "local" (eg, per-project or
per-directory) installations and name-spacing. The naming could use some work,
as "global" and "local" are pretty overloaded, but I think they are chasing the
right goals. Reproducibility (both for binary generation and data/experiment
reproduction), lock files (which lock in known-good versions of dependencies a
la Cargo), and other concepts that I care about were also thrown around. I
didn't catch all the details (and I'm not sure how much has been worked out and
implemented yet), but after my experiences with [Elm and Rust][elm-broken], and
the current state of packaging for Julia, I'm excited for Pkg3!

[elm-broken]: /2016/elm-everything-broken/

<a name="summary"></a>

### Overall Julia Feels

There is sort of an explosion of ideas and experiments going on. It feels sort
of like what the Ruby community maybe went through with web frameworks, or the
web community did with languages that compile to Javascript: ambitious ideas,
which may have been on the back-burner for some time, can finally be prototyped
quickly and tested in a mostly-real-world environment, and everybody is excited
to try it out and demo their creations.

One of the sponsors said:

> "there is something quite good about not feeling bad about programming"

and that seemed representative of the current state of Julia. It seems
undeniable that the language is less painful for developing performant
numerical code than the previous generation of languages and library wrappers.

Perhaps because of this enthusiasm and froth of ideas, I'm a little worried
that the foundations of Julia (the language and the ecosystem) have not yet had
time to fully bake. The more demos and experiments that get implemented, and
the more popular they become, the more delicate it becomes to make hard
decisions about language syntax and features. I think people want stability and
promised features *yesterday*, but these things take time and reflection. My
feelings right now is that it doesn't really matter. The enthusiasm for
*a language like Julia* is proven and growing. Julia itself might end up being
the first try that gets thrown away in a decade or two, but in the end we'll
end up with something which is both exciting and robust.

[PyX.jl]: https://github.com/bnewbold/PyX.jl
[rust]: https://www.rust-lang.org/