New version of the DejaVu Sans Mono package

I just finished fixing the bugs Brian noted in the previous version.  You can grab it using darcs get from http://free-the-mallocs.com/repos/dejavu-sans-mono.  This version also includes support for the oblique members of the family.  These are accessed using \textsl{...} rather than \textit{...}.  If you try it out, let me know if you have any problems.

Initial version of DejaVu Sans Mono package

I just finished pulling out the various bits of LaTeX and OMake I wrote for using the DejaVu Sans Mono font in my dissertation, as suggested by Brian.  You can grab it using darcs get from http://free-the-mallocs.com/repos/dejavu-sans-mono.

You will need to copy the TrueType files for DejaVu Sans Mono and DejaVu Sans into the directory before running omake. DejaVu Sans is required because I needed to steal a few glyphs from it that are not yet in DejaVu Sans Mono. The other notable omission is that I never needed the oblique version of the typeface, but I can add that with just a little more hacking.

OCamlTeX 0.7 Released

M. Colette Jean-Claude sent me his modified version of OCamlTeX that eliminates the dependency on Caml Shell. I've merged his changes into the darcs repository and deemed it version 0.7. The last change entry was from over a year ago. It feels like such a long time ago.

I have been meaning to write a little bit about my future plans for OCamlTeX for quite some time. Firstly, I am looking to do a complete rewrite eventually. One goal of this rewrite would be to take advantage of LuaTeX to hopefully reduce the communication overhead. My second goal will be to make it much easier to use other languages. Therefore, I expect I will rename it to something like MultiTeX that is a little more language agnostic.

My vision for this new design is first write a package in LuaTeX that will allow TeX to perform interprocess communication. Last time I investigated such things, I concluded that DBus seemed to be the favored interprocess messaging protocol. The actual MultiTeX package would be built on top of this package. Then all that is needed to add support for a new language is to write a daemon that accepts DBus messages from MultiTeX and sends back the result of the requests.

I developed this plan quite some time ago, but I have not begun the implementation partly because I have been so busy finishing InforML and my dissertation and partly because I have been waiting for LuaTeX to mature. Of course, if anyone is interested in helping with the
project, let me know.

Programming contest postmortem

For those who haven't investigated this year's ICFP Programming contest, I'll provide a brief overview. We were given the task of saving Endo, an alien that crash landed on Earth and needed his »DNA« to be rewritten into a form that would allow him to survive the conditions on our planet. Endo, being a member of the Fuun species has a decidedly non-terrestrial biological basis.

Endo's »DNA« consists of a sequence of the bases I, C, F, and P. It has the most usual property that it acts both as an information storage mechanism and as catalyst for reactions. So perhaps the existence of the Fuun support the RNA world hypothesis, as we could say that their »DNA« does the same job as both our DNA and RNA. The catalytic process produces new sequences of »DNA« that provide the instructions for building a Fuun. We were given the information needed to express a Fuun's phenotype as 600 by 600 pixel images. Furthermore, we were provided with Endo's current »DNA« sequence, his current phenotype,

and a target phenotype

as suggested by his trusty and intelligent spacecraft Arrow. Our task was to synthesize a sequence that could be used as a prefix Endo's current genome, that would as closely »morph« Endo towards expressing this phenotype as possible.

Overall, I would say that this year's contest proved to be much more challenging than in previous years. However, I say this based primarily on the fact that (a) my team of two could not build a usable tool chain to start experimenting with Endo's genome within 48 hours and (b) the close of the closed the best team with public statistics had developed a sequence that would give Endo an 8% likelihood of survival. However, the overwhelming majority of the participants were well behind this. Not only was there the challenge of building a workable tool chain, but understanding Endo's genome well enough to reverse engineer it. I gather there were hints that became available once you had working tools, but I cannot really comment on their nature.

My team would have greatly benefited from additional minds to throw at the problem, but even so we did a few things sub-optimally. We initially divided the work at too high of a granularity. Brian start writing a DNA execution engine, while I built a tool for rendering the expressed phenotype. However, we should have concentrated more on writing smaller reusable components for the common structures to avoid duplication and make optimization easier.

The biggest problems we had were in debugging our DNA execution engine (or even figuring out whether it was buggy) and executing it efficiently enough that we could perform experiments. Brian's initial attempt was far too slow, though I can't remember exactly what structure he had chosen to use for representing sequences, other than it was partly based upon list. I then tried an approach that used an OCaml Bigarray to hold the maximal twenty-five megabyte working space for sequences during the simulation. However, this still proved to be too slow.

The next day, Brian tried some other approaches while I coded a new execution in OCaml from scratch, to have independent implementations for comparison and so that I could test some additional approaches to representing DNA sequences.Eventually I came to the conclusion that we taking a significant performance hit because of all the large copies we were performing. Knowing what sort of people the organizers were, I decided that perhaps the way they would have thought about the problem was to create a data type with constructors for each of the operations that the data needed to support and then begin working out algebraic laws for it. My initial version looked like

type seq = StringSeq of string
| ConcatSeq of seq * seq
| ConsSeq of base * seq
| SnocSeq of seq * base
| SliceSeq of seq * int * int

this proved promising, but eventually concluded that in practice ConsSeq was never really used. So I then switched to

type seq = StringSeq of string
| ConcatSeq of seq * seq
| SnocSeq of seq * base
| SliceSeq of seq * int * int

By using this data type and taking advantage of the various algebraic properties I was able to come up with a implementations of concatenation, »snoc«, indexing, slicing, and computing the length of sequences that were reasonably efficient and worked without performing any copying (though I did introduce some rewrites that would copy a coalesce small StringSeq chunks). Incidentally, this turns out partly to have simply been reinventing the »rope« data structure that many participants found acceptably efficient.

However, the data structure did not account for the quotation operation on sequences. Attempting to quote instances of the seq data type in practice would require flattening the structure first, which would cause my implementation to die thrashing. Furthermore, I was unable to come up with a satisfactory method of treating quotation symbolically, because it does not commute with slicing.

In retrospect, and with an additional thirty-six hours behind me, I think I could have probably moved to a representation like

type seq = StringSliceSeq of string * int * int
| ConcatSeq of seq * seq
| SnocSeq of seq * base
| QuoteSeq of seq

where QuoteSeq is lazily eliminated, and that would have probably resolved the problem. However, even then we would have had considerable challenges of ahead of us.

Still, I thought it was a really clever and interesting problem, and it was a refreshingly concrete problem to focus on after spending so much time on InforML and my dissertation.

Brian and I have concluded that we have run out of ideas (and steam) towards continuing on the ICFP Programming Contest. Plus, I have a whole lot of dissertating left to do. But I'll write a more detailed postmortem after the contest deadline. While the organizers have not explicitly said anything this year, I assume that they would prefer that we do not discuss essential details of the contest and our solutions until then.

Update: I forgot to mention the highly ironic fortune I received while Brian and I conferred over Chinese: »You should be able to undertake and complete anything«. Maybe they had my dissertation in mind instead.

Further update: As this page was the first one linked in a recent e-mail to the contest discussion list, I figured I would add a link to the actual post-contest writeup.

Endo in peril

I started from scratch on a new implementation of the DNA execution engine and am going to start on attempting to finish it now, but things aren't looking too promising for our team. We could have definitely used another person or two I think.  I guess we'll see.

Status report

Some unknown benefactor fixed the Peter Buneman Memorial Coffee Machine.

Brian being the Coq expert concluded that he couldn't see any way that dependent types would help us at the moment, so we've pretty much been writing pure OCaml code. At this point we have what appears to be a working tool for checking the score of a particular output image, a renderer for the RNA command language that seems to work, and the DNA execution engine is being debugged. It is hard to conclusively test the RNA renderer until we can begin running the output from Endo's starting DNA. I might write a small high level language to make generating test RNA easier, but not just yet.

However, the hardest part is going to be reverse engineering the DNA they've provided us with.

If none of this makes any sense, you should go check out the ICFP 2007 Programming Contest task description.

Life support failure

Oh what a morning for the Peter Buneman Memorial Coffee Machine to break on us!

Morph Endo!

And we're off. Well, really we've kind of been at it since about five after six, but we've been in the office scrutinizing printed copies of the task since seven thirty. It looks like it will be pretty interesting. I think probably the best way to understand some of it is going to be to just jump in and start implementing.

ICFP Programming Contest 2007 (and more)

This year's ICFP Programming Contest starts this coming Friday at noon, Central European Time. This year it is being run by a group, mostly at Utrecht University. There is actually an official blog where they have been developing some of the back-story for this year's contest. So far it looks like it will just be Brian and I representing Penn PLClub this year. Currently our plan is to write our entry in Coq (with a smidgen of OCaml), However, given that we cannot really plan for what they'll throw at us, it could prove to be very cumbersome. I would have also liked to have given it a try in Scala, but with the POPL deadline, Brian really wouldn't have had much time to learn it.

In some tangentially related news:

• Martin appeared (perhaps »was heard« would be more accurate) on Software Engineering Radio to talk about Scala. I haven't had a chance to listen to it yet, as it is nearly an hour long, but if I split it between the walk into the office and back tomorrow I should be able to get through it.
• The Cult of the Bound Variable finally released their implementation and tools from last year's ICFP Programming Contest. I spent a few minutes poking around the directories and code, but there is enough to keep someone busy for quite some time trying to grok it all.