My good old blog has made the successful journey out of Wordpress and into Octopress! All of the old posts seem to have survived the trip, too. I was always a little displeased with the heavyweight nature of Wordpress, and the promise of writing my posts in my text editor was all too tempting. Seeing as though I’m a guy who fairly regularly writes about code, I’m also pleased with the vastly superior code snippet support. On top of that, the default theme is more pleasing to my sensibilities than anything I could find for Wordpress. Of course, not all the reviews are glowing. I’ll try to report back as I have more experience with the platform.
I’ve already gotten some questions about how I made the transition. Honestly, it was pretty easy. The first step was to set up Octopress, which simply involved following their directions. After that, I exported my Wordpress posts to XML and converted them to Markdown per Fritz Stelluto’s blog post. It’s all pretty straightforward, but basically, you get the XML export of your blog via the Wordpress admin view and run it through the Exitwp tool from Github. I only had one post that didn’t convert properly, and it took me about 5 minutes to convert it by hand. Once all the posts are in Markdown form, it’s a simple matter of moving the posts into the _posts directory of your new Octopress blog.
I personally set my blog up to deploy via rsync, backing it up to a Git repository sitting on my web server, again from documentation on the Octopress site. Again, just reading the directions thoroughly should clear up any questions.
My only issue was getting the right Ruby VM working on my Mac laptop to get set up with Octopress. Word to the wise: OS X Mountain Lion uses Ruby 1.8.7 by default, and Octopress requires 1.9.3. The whole Internet seems to say that you should use either rbenv or RVM to manage different Ruby installs rather than providing a solution to globally install 1.9.3 (which I suppose makes sense of XCode tools depend on 1.8.7). Since I prefer lighter-weight tools, I tried to use rbenv, but that led to a 1 1/2 hour headache and no luck. I found RVM much easier to get up and running. That said, I do prefer tcsh, and all the directions assume you’re using bash. Since I eventually found myself dropping into bash anyway, perhaps the world just hates tcsh.
A couple of weeks ago I gave a talk for the Triangle Clojure Users Group giving an overview of the state of constant space tail calls in Clojure and the techniques employed in the Clojure Tail Call Optimization compiler (hereafter known as Clojure TCO or CTCO) to improve that support. Rather than following my normal routine of writing up slides and then practicing the talk to death, I wrote a set of extensive notes to fully flesh out all the topics I wanted to cover and then gave the talk more off-the-cuff and used notes and drawings on a whiteboard. Unfortunately there’s no recording of that talk, but I figured I could share my original notes. I’ve cleaned them up a bit, removing the parts that make reference to specifics of the talk, but they’re still a bit unapologetically rough.
Introduction
Rather than discussing the nitty-gritty details of how I’ve put CTCO together, I think it’s better to get an understanding of the history of tail calls and then examine the techniques that CTCO uses, but not necessarily exactly how they’re implemented. The hope is that discussing the techniques individually will help one understand how they can be useful both and in and outside of the context of TCO. Particularly, I think there’s a lot to be learned from the CPS transformation, and I hope this can be a stepping stone to understanding continuations for those who weren’t introduced to them by PL giants like I (eternally gratefully) was. Roughly, the presentation goes like this:
The history and anatomy of a tail call
The state of tail calls in Clojure and what CTCO does to improve the situation
How to understand the CPS transformation and use continuations in your own everyday programming.
The (somewhat easier) transformations of thunkification and trampolining.
How these transformations fit together to provide constant space tail calls.
The (speed) drawbacks of using CTCO
A History Lesson
To get started, let’s hop into our time machines and go back to the year 1977. Fortran, Pascal, and PL/I rule the land, and functional programming languages (i.e. Lisp) are relinquished to the halls of academia. Common knowledge of the time states that structuring programs in terms of procedures makes them more readable and easy to maintain, but people in professional practice claim that they are too slow to use more than sparingly. Instead, programmers structure their code around the dreaded GOTO because it’s “inherently faster.”
Enter one fresh-faced young Guy Steele in the AI lab at MIT. His paper, “Debunking the ‘Expensive Procedure Call’ Myth or, Procedure Call Implementations Considered Harmful or, Lambda: the Ultimate GOTO” (truly a man after my own heart in terms of ridiculous names), helps to show the world that procedure calls are not inherently slow, just poorly implemented by compiler writers of the time. Steele covers a lot of ground in the paper; in it, he claims that:
Procedure calls are essentially just themselves glorified GOTOs
Lisp compilers (compiling a language generally held as “really slow”) could produce faster numeric computation code than venerable Fortran compilers when they used efficient implementations for procedure calls.
One can express complex state diagrams in terms of sets of simple procedure calls.
Of highest interest to the current discussion, Steele codifies the notion of a tail call, which he defines as the last procedure call made by an expression. Let’s take a look at the example that he uses in the paper:
12
(define (barxy)(f(gx)(hy)))
As he walks through the example, he notes that a naive compiler would generate code that would create a stack frame for the call to (g x), saving its value, then a stack frame for (h y), saving its value, and finally creating a stack frame for the call to f with the results from the previous two function calls. (Note that in languages like Scheme the order of evaluation for arguments to procedure calls is undefined, and so g and h may be called in the reverse order in those languages). Why is the code generated by our theoretical compiler “naive?” As Steele points out, on returning from the call to f, there’s nothing left for bar to do, therefore a sufficiently smart compiler will simply reuse the stack frame for bar for the call to f. That is, it would place the arguments in the appropriate registers and jump to f. Thus our the last procedure call in bar, the “tail call,” is like a GOTO with arguments.
After showing compiler writers the error of their ways, Steele shows the power of writing programs in terms of tail-recursive procedures and how much cleaner it is than using GOTO. This is particularly apparent to anyone who has implemented a state machine or a parser.
Steele framed the paper as an admonition to language implementors to shape up in efficiently handling procedure calls, and it would seem that they listened. Following his paper, Guy Steele released the Scheme programming language with Gerry Sussman, and as part of the language’s design, it required constant space tail calls. Sometime later, ML came on the scene as a functional programming language that also required constant space tail calls in its standard.
It’s probably worth taking the time at this point to note a slight distinction in terminology that’s often ignored when discussing this topic. When talking about languages like Scheme and Standard ML which require constant space tail calls in their standard, it is termed “proper handling of tail calls,” as any other handling would violate the language’s semantics. In the case of languages which do not require constant space tail calls, the term is “tail call optimization.” As we should all know by now, Clojure (being implemented on the JVM) doesn’t require constant space tail calls, and so we’ll primarily use the term “tail call optimization.”
Constant Space Tail Calls (Back in the Present)
Constant space tail calls seem like an awesome idea, so surely all language implementations guarantee them, right? As it turns out, no. Some languages, like Java, go ahead and create a stack frame for every procedure call. It has something to do with security policies and stack traces and the like. Sorry, I’m no JVM expert.
As many Clojurians (Clojurers?) probably know, the situation isn’t completely bleak for JVM languages like our favorite JVM-bound Lisp dialect. We have one construct, the recur form which leverages the underlying implementation for while loops to provide constant space tail calls for self recursion. That is, to the top of a function or the loop construct.
Clojure.core also provides a “trampoline” function which can be used, along with some manual transformations, to achieve constant space tail calls for an arbitrarily large set of mutually recursive functions. I’m not a big fan of the “manual transformations” part, but it’s certainly a workable solution. As to what those transformations are, we’ll actually be talking about them later.
Enter CTCO
So Clojure provides not one, but two ways to get constant space tail calls on the JVM. That’s pretty good, right?
I think not.
Well, if I’m gonna complain, than I should have a better idea, right? I like to think that I do. Enter Clojure TCO. It’s a little compiler I whipped up that preprocesses a Clojure expression, parsing it into Clojure 1.3’s records and performs a set of transformations (implemented in terms of Clojure 1.3’s protocols) which have been tried and true in garnering constant space tail calls from systems which don’t provide them. Don’t worry, we’ll get to the specifics in just a second.
So what’s the basis for this process? I certainly didn’t pull them out of thin air. Rather, they come from decades-old work by Dan Friedman, which he titled “How to Write Interesting Recursive Programs in a Spartan Host .” In that work, he starts from Scheme code written in its natural, recursive style, and goes step-by-step applying the requisite, straight-forward transformations until the Scheme code becomes C code. Dan makes the process seem so easy that He’s made the process an assignment for his Intro to Programming Languages course for a lot of years (it was one of my favorites, incidentally).
Enough with the teasing; how does one (not so) magically pull tail call optimization out of arbitrary code? In three simple steps:
Continuation passing style (CPS)
Thunkification
Trampolining
Uh-oh. I said “continuation.” You may think to yourself, “this has got to be complicated.” Not at all; not even if you don’t know how to do the CPS transformation. So let’s start there! (You’d think I set that segue up for myself)
Step 1: Continuation Passing Style
So what’s the key to learning how to do the CPS transformation? Well, first let’s start with a function definition. And since we’re all FP folks, here, it’s gotta be factorial:
12345
(defn fact[n](if (zero? n)1(* n(fact(dec n)))))
As we dive into applying the transformation, a key idea is that there are two types of expressions from the point of view of the CPS transform: trivial expressions and serious ones. For simplicity’s sake, trivial expressions include atomic values (e.g. numbers, symbols, strings, etc.), functions (i.e. fn expressions), and calls to simple procedures that we know return quickily (i.e. +, first, conj). Serious expressions include any procedure call which may recur and call off to other recursive procedures and are just in general complicated. Once we’ve gotten our heads around all that, the rest flows pretty simply from those rules.
The first thing we do to apply the CPS transform to our function definition is to add a new argument to the function. For absolutely no historical reason whatsoever, we’ll call it k (excuse my sarcasm):
1234567
;; 1: Add a k argument;; NOTE: this code won't work (yet)(defn fact[nk](if (zero? n)1(* n(fact(dec n)))))
This argument is our “continuation.” Seriously, ignore the five-dollar name; just think of it as our k. You can also think of it as a function that at any given point takes a value and performs the part of the computation that is waiting to be done. Don’t worry if that doesn’t make perfect sense yet; the example should make it a little clearer.
Now our procedure takes a new k argument, so the next thing is to deal with our if expression. Luckily for us, the test part of it is a call that we know will return quickly. If it didn’t, we’d have to do something a little more complex, but we’ll ignore that for now. Instead, we just have to worry about the true and false branches of the if, and we can consider them separately.
First up we have the true branch: in which we return 1. This is clearly a simple value, so what do we do with it? Whenever we have a simple value, we apply our k to it. After all, it’s a function that takes a value and performs the part of the computation that’s waiting to happen. Now our definition looks like this:
123456789
;; 1: Add a k argument;; (1.5: Ignore 'if' because the test is trivial);; 2: Apply k to value in the 'true' branch of the 'if';; NOTE: this code won't work (yet)(defn fact[nk](if (zero? n)(k1)(* n(fact(dec n)))))
Next we have to deal with the false branch of the if, and boy, is it a doozy. We’ve got a multiplication with an argument that’s a recursive call to fact.
This brings us to our next rule of applying the CPS transformation. When we have encounter non-tail calls (like the one to (fact (dec n))), we (recursively) pull them to the front and create a new k that encapsulates the part of the computation that’s waiting to be done. Ok, sorry, that sounds really complicated. It’ll be a lot easier if we just look to our example.
When we look at the expression (* n (fact (dec n))), we should probably notice that the part that makes this serious is the call to (fact (dec n)), since it’s otherwise just a multiplication, which we know is trivial (i.e. returns quickly). So what do we do?
The astute reader will note that since we added a k argument to our definition of fact, this recursive call doesn’t make sense anymore; it’s missing an argument. That’s the new k we have to create. Well, we know that we can think of a continuation as a function that takes a value, so it starts looking something like this:
1
(fn [v]---)
But what goes into the —? If you’ll remember, our k represents the part of the computation that’s waiting to happen. We also know that in the original definition of fact, there’s a (* n —) waiting to happen, so we can probably put that in there:
1
(fn [v](* n---))
So what goes into this —? It seems reasonable to put the value fed into the k in there, right? That kinda makes sense from where we call fact on 1 in our base case in the true branch of the if:
1
(fn [v](* nv))
Another way to think about it is that the value that we feed into the k is the value we have when the computation is done.
This isn’t quite our new k yet, though, since we didn’t use the old one. Remember, at any given point, our k represents the part of the computation that’s waiting to happen. Here we can simply apply our old k to the result of (* n v) inside our new k:
1
(fn [v](k(* nv)))
Thus we’ve created a new k which takes the value computed by the recursive call to fact, performs the part of the computation that was waiting to happen in our original, serious non-tail recursive call, and finally applies the rest of the computation that’s waiting to happen represented by the k that got passed in.
And so we end up with this new definition for factorial:
123456789101112
;; 1: Add a k argument;; (1.5: Ignore 'if' because the test is trivial);; 2: Apply k to value in the 'true' branch of the 'if';; 3.1: Pull the serious, non-tail call to fact to the front;; 3.2: Create a new k which performs the part of the non-tail call;; waiting to happen and applies our original k;; NOTE that this code *will* work!(defn fact[nk](if (zero? n)(k1)(fact(dec n)(fn [v](k(* nv))))))
An important feature to note about this definition, as well as any function in CPS, is that all calls are tail calls. So if we wanted to, the “false” branch of the if could swap fact with recur and not change the meaning of the definition (though it would get constant space tail calls). This is one of the important features of CPS for the purposes of CTCO which may or may not be apparent. We’ll get into that in a moment.
Should we want to call this like we did our original definition, we can just pass the number argument and the identity function, (fn [x] x), or “empty continuation.” This should make sense since our k is supposed to be the part of the computation that’s waiting to happen. At the beginning of a computation, there is no part that’s waiting to happen, and so kis just a function that takes a value and returns it, unchanged.
What Are Continuations?
With that, the requisite knowledge has been dropped to move on to my next point, which I think is the most important observation that I’m presenting. First, recall our original definition for fact:
(Note that this may not be exactly how the continuation is represented internally, but it is equivalent and makes the point I’m driving home much clearer)
Notice anything familiar about the continuation, er, that is, the ‘k’ argument in the call to the CPS fact and the stack in the original version? Looks pretty similar, huh?
This leads me to my biggest claim of the whole discussion:
CONTINUATIONS ≡ STACKS
Continuations are simply a version of the stack that we’ve represented using functions in place of the one that the JVM runtime usually takes care of for us automatically. So for the rest of our discussion, any time I use the word “continuation,” you should be saying to yourself “stack.”
To make this even more painfully apparent, let’s consider a slightly different definition for fact which takes an additional accumulator argument:
Those paying attention (or who have seen this before) will note that this definition is tail recursive, since there is no computation waiting for the recursive call to fact-acc to finish before completing. We can call this function much like the original definition as long we pass an initial accumulator value of 1. Thus we get:
1
(fact-acc51)== 120
If we go through the CPS algorithm we discussed before, this mostly falls out pretty easily. First we add our k argument:
We know that the if has a simple test, so we don’t have to do anything special with it. In the “true” branch, the a is a simple value, so we simply apply our k to it:
Which brings us to the “false” branch. In our original definition, there was a non-tail call which we pulled out and encapsulated the rest of the computation in the new k that we created. Here, however, there’s no non-tail call; both the (dec n) and the (* n a) are simple, fast-returning expressions, meaning that when the call to fact-acc returns, there’s no computation waiting for it. So what does that mean for k? Well, we simply pass it along:
So let’s take a look at our call stack for a call to this function, assuming initial values of 1 for the accumulator and the identity function for the initial, empty continuation:
Note that throughout the whole computation, the continuation never changed. This makes sense because there was never a non-tail computation that caused another one to have to wait. This also illustrates that in general, no stack frames need be built for tail calls, supporting the history lesson from the beginning of the discussion.
So with that very long discussion of CPS behind us, what does this mean for CTCO? Of particular importance to CTCO are two of the characteristics of programs in CPS that have already been mentioned:
All procedure calls are tail calls
The continuation provides an explicit representation of the stack at any point in the computation.
At this point, one might think that CTCO need do no more than apply CPS and replace self-recursive calls with recur, given the first property of CPS and our example using factorial. However, this won’t help for groups of mutually recursive procedures. To handle that, we must apply a couple more transformations. No worries, though, they’re a lot easier to get a handle on than CPS.
Steps 2 and 3: Thunkification and Trampolining
Our next transformation is simply called “thunkification,” which we apply to an expression that has already undergone the CPS transformation. The main idea of thunkification is to wrap any expression that would grow the stack in a function of no arguments, known as a thunk. This simply makes it so that any expression that would grow the stack instead returns a value, namely a function that when invoked performs the next step of the computation. And that’s all there is to it!
Of course, thunkification isn’t very useful unless it’s coupled with the final transformation, “trampolining.” You know how thunkification essentially broke the computation into its individual steps, each one returning a thunk that when invoked, performed the next step of the computation? Trampolining simply runs each step of that computation until there’s some indication that it has finished. Plus it does so using a constant space looping construct, in this case recur.
Remember how our CPS transformation ensured that we had a representation of the stack manifest as an argument? The cool part about that is that it allows us to clear out our JVM stack on each “bounce” on the trampoline, but retain the context it would have stored instead in our continuation.
Not too bad, huh? Of course you’ll note the cryptic <done?> predicate. It can be implemented in any one of a number of ways.
This is a common enough trick that clojure.core includes a procedure called trampoline that does this, assuming that the input function has been thunkified by hand. However, the built-in trampoline simply keeps invoking the argument value until it’s no longer a procedure (i.e. <done?> = (not (procedure? thunk)). This is kind of annoying should one want to return a procedure from the trampoline. As you may well know, the Clojure homepage advises that if you need do this that you should put the return procedure into a data structure that is destructured after the trampoline returns. Again, this is one of the thing that bugs me about constant space tail call support in standard Clojure.
And yet again, when I complain about the built-in support, I like to think I’ve got a better solution on hand. Instead of asking if the value passed to the trampoline is a procedure, CTCO uses a (boolean) flag local to the computation to determine whether the trampoline needs to continue.
When I was a little more green around the ears as a Clojure programmer, this was a ref to a boolean. As Rich Hickey advised after I made my first release of CTCO, “you should definitely be using atoms instead of refs.” As David Nolen put it, “an experienced Clojure programmer would probably spit his/her coffee if he/she knew that you were using a ref.” So I lived and learned and changed the ref into an atom (thanks, Rich). Being one of the Clojurians who spit his coffee when he read through the initial release’s source code, Alan Dipert kindly changed the flag into a piece of metadata attached to each thunk, which it remains to this day.
The Drawbacks
So CTCO sounds pretty great, right? You can write simple, tail-recursive procedures all day and never worry about stack overflows by running them through CTCO. But it’s not all roses.
There are drawbacks to CTCO, namely that it necessarily introduces computational overhead. First, we’re passing around our k argument everywhere, which is an extra cost. We’re also creating new procedures to represent continuations wherever there are non-tail calls, which takes time and requires additional heap space to allocate. We’re also introducing overhead by creating thunks to delimit the bounces on the trampoline. Finally, we have to execute our computations one step at a time on the trampoline. It’s intended to be a small, tight, tail-recursive loop, but it’s still slower than running the code directly, especially since it has to check whether the argument is a thunk on each bounce.
Great pains have been taken with CTCO to mitigate these costs. The CPS algorithm is intended to introduce as few continuations as possible. There’s been a little bit of work in trying to minimize the number of thunks, though that’s still an open problem that I’m hoping to address in the near future.
On the topic of allocating new continuations for non-tail calls, it’s worth making an aside to point out an important trade-off: while CPS converts non-tail calls to tail calls, allowing them to be performed with constant stack space, that stack space is essentially converted to heap space for the new continuation we have to allocate. In that way, applying CTCO to a primarily non-tail recursive program will prevent it from overflowing the stack, but it will likely trade that stack overflow for a heap overflow.
Conclusions
So what is the primary contribution of CTCO? My view is that it gives the programmer peace of mind. I think it’s unfortunate that Clojure puts as much onus as it does on the programmer to ensure that tail calls are constant space. Using a system like CTCO allows one to write programs in a tail recursive manner and know that constant space tail calls are guarunteed. This is really great if you want to write something like a parser or a state machine that makes tons of tail-recursive calls to other functions. If you’ve not written something like this yourself, just check out ctco.parse to see how simply a parser can be written in tail-recursive fashion.
And don’t let the drawbacks scare you off from CTCO. The point is not to use it on every expression of every program that you write. That’s just making your program slow just to use some cool transformations.
Rather, CTCO is intended for those tail-recursive programs that won’t finish because of stack overflow errors, where Clojure would require you to apply these tranformations anyway. The hope is that you wouldn’t be able to do a better job than CTCO if you transformed the code yourself.
As promised, I’m going to post some more information about the Clojure TCO project that I’ve been working on with Dan Friedman. For those who missed it, my good friend Eric Holk posted the video from a talk I gave two weeks about it. Just for a little context, I gave this at our informal weekly PL Wonks colloquium series. We keep it pretty light, so there were a number of stops to discuss certain points and quite a few inside jokes.
I apologize to any Clojurers bored by the basics discussed in the video; the IU PL group is primarily made up of Schemers and Haskellers. While they’re no strangers to Lispy languages, few had used or knew much about Clojure.
When the video went up, a number of people asked for slides, so I have made them available here. Also, here’s the video to go with it (via YouTube):
Some notes worth mentioning about the video:
At about 4:06, I say that there’s a “cross-compiler from Clojure to ClojureScript.” I of course misspoke myself; I meant to say that it compiled Clojure to JavaScript.
At around 11:30 there’s a fairly long discussion comparing Clojure’s use of “recur” to Scala’s automatic self-recursion tail-call optimization. I say in the video that I’m not sure why it’s not automatic in Clojure. Clojurers from IRC have said that it’s to avoid “magic” in the compiler in which one would write code they thought was tail-recursive, but then blows the stack. This actually leads into a possible loop optimization suggested by a fellow IU compiler hacker, but I’ll write more on that later.
I mention a couple of IU undergraduate courses by catalog number: C311 is the programming languages course taught by Dan Friedman and P423 is the compilers course originally taught by Kent Dybvig.
The video says that the trampoline’s flag is a ref, which was true at the time. I have been informed that this is a faux pas, and the current version uses an atom instead. It may come as no surprise that this was a substantial performance improvement.
The demo portion was cut off primarily because I had technical difficulties (as always happens with live code demos).
There is of course tons and tons to say about the work that Dan and I have done. I’ve been chatting with David Nolen for a few days, including about the possibilities of reusing the CPS transformer for asynchronous client-side scripting code with ClojureScript, as well as adding full call/cc and shift/reset to Clojure.
I’ve also got enough ideas to keep work going on CTCO itself for the foreseeable future (especially with finishing my M.S. next week and starting work at a real job in the next month). So if you’re an interested Clojurer, please take a look at the code on Github, fork it, play with it, add to it, and have fun! Also feel free to ask me questions; I try to be prompt about answering email and often appear on the Clojure IRC as cjfrisz.
I’m a bit busy with classes finishing up and other research deadlines, but I hope to keep writing some updates about the inner workings of CTCO and talk about where I would like to take the project from here. Until then, enjoy!
Well it has certainly been a long while since I wrote anything here. I’ve been a very busy boy, and I’ve got some work to show for it. Dan Friedman and I have been working on adding tail-call optimization to the Clojure language. A video went up a few days ago from an informal talk I gave to the IU PL group a couple of weeks ago about it, and I’ve posted some code to back it up.
Now that it’s starting to get out there, I’m going to try to do some writing about my work. Check back for the slides from the video and some write-ups on the state of the project.
Stuff keeps moving a mile a minute around here. I just got my graduation requirements squared away, so I’m all but set to be Master Frisz come May. Too bad that’s not a real title that people can use and also sounds really dumb. Even with that in place, it’s far from a smooth ride on my way to graduation. I’ve got two classes, one in which Larry Moss talks about how awesome recursion theory is, and the other in which Dan throws crazy PL problems at me twice a week.
I sound like I’m complaining (and I probably am), but the classes are actually pretty fun. The interesting part was writing out this month’s goals. It seems that both my main research projects are set to hit experimental testing this month. The JavaScript Ad Security project will have numbers on how well our machinery detects ads versus the current standard, AdBlock, hopefully by the end of next week. I’ve also thrown together a sweet piece of term-rewriting code for my project with Dan that I wish I could talk about (but it’s a sEcReT). It seems to be working as I want, and so now it’s time to test whether it’s performant.
The other goal I have this month is to line up some job interviews…which is also an experiment of sorts (I guess). I’ve got leads, and even some particularly exciting ones. Trust that I will give details as they come up, though it may behoove me to stay fairly impartial in talking about anything that happens until I nail down plans.
Like a ton of awesome bricks! That is, I have my first video on the Internet, and has a whole 10 views. The good folks from the Digital Curation Centre (note the fancy ‘r’ before ‘e’) have posted the videos from IDCC 2011. That includes my first-ever-anywhere invited talk for my first-ever-anywhere first-author paper “Assessing Migration Risks for Scientific Data Formats.” I enjoyed it pretty well myself, so I thought I would share with the class. Enjoy!
Life’s just chugging along, sometimes a bit too fast. Research is making good progress. Classes are going well. It’s terrifying to think that I’m graduating in May, possibly making my first exit from academia for longer than 3 months for the first time in 18 years. Speaking of which, I’ve managed to keep up with everything else well enough that I’m entering into dangerously-close-to-late territory for applying to jobs. D’oh! Well, that’s why my to-do list says to start LaTeX-ing up my CV today. I’ve got some good leads, especially thanks to POPL last month. It’s about that time that I need to act on them, though, or people will forget about me.
Research is good for now. We’re heading toward our inevitable paper for the JavaScript Security project. Er…that is “Empirically Identifying Ad-Related JavaScript Execution Through DynamicStatic Dynamic and Static Program Analysis.” I guess I probably need to come up with a snappier title than that. We’ve been saying we’ll have a paper for what seems like ages now, but we have some preliminary test results and the experimental setup is getting ready. Plus, there’s already something of a built-in time constraint on me getting this project in paper form before I graduate.
I’m also making progress on my sEcReT pRoJeCt with Dan Friedman. It’s my fun little toy of a research project where I get to chat with Dan about goofy program transformations. I need to dig in on that so I can actually talk about what we’ve done.
As for Spidermonkey optimization work…well that’s on the back burner in terms of official work. It seems that Ryan and I both got sidetracked in teaching the undergraduate compilers course. I’m a big fan of the job, so I’m not complaining. And it hasn’t stopped me from putting a Spidermonkey install on my machine to do some work in my spare time (when I find it).
I also decided to get some more work out in the open for people to see. I’ve got two new repositories on Github of goofy stuff I’m working on. One project has to do with that I’m in a computability theory class through the Math Dept. this semester. The course includes work with text register machine language called 1# developed by my professor, Larry Moss. Writing programs in the language is akin to programming a Turing machine, which is to say, not fun. Many of the vast population of computer scientists in the class have written interpreters from a higher-level language to the 1# source. That is, they’ve added some nice features like labels with goto.
That’s all well and good, but not ridiculous enough of a learning experience for me. Rather, I’m going to write a compiler from a pretty high-level language (something resembling one of the intermediate languages from the compilers class I’m teaching) to 1# code. Being a Scheme junkie, it’s again my language of choice. And even though I’m a Chez Scheme acolyte, I’m going to do my best and make it generally R6RS compliant. I’ve already written (and re-written) the runtime, and it seems to be pretty zippy. Well, as zippy as you can expect from a text register machine. I’m working on adding labels and goto today. We’ll have to see where the thing ends up, but I imagine it’ll have a half-dozen to a dozen passes to add some sweet language features and maybe even generate “efficient” code. Of course I know it all seems silly to write a compiler for such a thing. I’m trying to take the opportunity to integrate some Scheme features into my repertoire that I haven’t used (at least for a while), though. Plus, how else am I supposed to feel accomplished when I’m procrastinating from doing real work?
I’ve also got loose plans to write a little Nethack-style RPG in a Lisp-y language. Of course my first intention was Scheme, but I’ve been falling for the Clojure language recently, so I might see how easy it is to rig something up there. There’s no code as of yet, but I’ll try to add some between JS Security, Dan hacking, teaching compilers, recursion theory, Advanced Dan, 1# compiler hacking, and job hunting. Oy, I really shouldn’t have listed all that out…
Many of you (though I use the term “many” lightly) will remember that I wrote about Internet startup Zediva, the movie streaming company using a Rube Goldberg-ian system of DVD players in a California data center coupled with the first-sale doctrine to deliver streaming content while avoiding the licensing fees that the studios have been piling on the likes of Netflix. It seemed a bit ridiculous, but by my estimation it was still legit.
As you may recall from my first report, the MPAA got the California courts to serve the company with an injunction. This led to the hilarious though not entirely ridiculous claim that the judge had ruled infringement was defined by the length of the cable serving the content. At the time, Zediva claimed they would appeal the decision and fight for proliferation of digital content the way consumers wanted it.
I’m actually quite disappointed to report (and quite late about it, too) that Zediva signed their own death warrant at the end of last month. In late October, the company settled with the MPAA and officially laid off the DVD-changing monkeys. One could wish that this would end up as another victory for Internet freedom, but at least we still have the increasingly favorable fight against SOPA.
One of the things I claimed I would do toward the end of my stint of writing about digital right was to write about the bill that at that point (in its Senate version) was called the Protect IP Act. The thoroughly terrifying bill gave the Department of Justice the ability to take down any website “dedicated to infringing activities,” and they weren’t even required to give notice if the owner was sufficiently difficult to reach.
Since it’s inception, the bill has undergone a few revisions and a few names, from the appropriately ominous E-PARASITE to its current House iteration SOPA (for the Stop Online Piracy Bill). Big content owners like the MPAA and RIAA have come out in favor of the legislation for the obvious and aforementioned reason that they can legally use government resources to shut down websites they don’t like.
For all my talk doing deep research on this issue I must say that all I’ve just laid out and done before should be enough to convince anyone that this a bad idea. And on top of that, opponents to the bill have come out in droves, big and small. In particular, several notable Internet companies and blogs like Mozilla, the EFF, and Techdirt have been promoting American Censorship Day. Interestingly, there’s also some comments from Vice President Biden about just how bad the bill would be.
All the research I did before was for issues that often seemed like good ideas (“let’s stop child porn” or ”shut down copyright infringing companies”) that at their core were actually horrible ones. But this is out and out a bad idea. Every single American should be opposed to this because the very notion of censorship based on what the government doesn’t like is against every notion of free speech we stand for. As always, here’s a link to make your last stand against the bill and keep the Internet free. It wish it felt more ridiculous to write this, but speak up while you still can.
My adviser for one of the projects I had previously mentioned wants me to keep it somewhat secret, so I’ve removed that content from my previous post. I’m not so self-indulgent as to think that someone is looking to steal ideas from my blog, but I will nonetheless probably not be making public posts about it in the future. However, I’d like to keep writing about that project and I think there’s a fair number of people (particularly those at IU) who should be able to read about my progress and thoughts. I’m toying with the idea of password-protecting such posts and if anyone is for or opposed to the idea, please let me know.