Archive for July, 2007

Linkage dump

From Daring Fireball:

Typographica has some discussion on a new serifed typeface Gina.

I wish I had known about, OcaIDE, a very promising Eclipse plugin for OCaml before starting the ICFP Programming Contest. If nothing else, it would have saved a bunch of time looking up standard library functions on the web.

Comments (3)

Programming contest postmortem

For those who haven't investigated this year's ICFP Programming contest, I'll provide a brief overview. We were given the task of saving Endo, an alien that crash landed on Earth and needed his »DNA« to be rewritten into a form that would allow him to survive the conditions on our planet. Endo, being a member of the Fuun species has a decidedly non-terrestrial biological basis.

Endo's »DNA« consists of a sequence of the bases I, C, F, and P. It has the most usual property that it acts both as an information storage mechanism and as catalyst for reactions. So perhaps the existence of the Fuun support the RNA world hypothesis, as we could say that their »DNA« does the same job as both our DNA and RNA. The catalytic process produces new sequences of »DNA« that provide the instructions for building a Fuun. We were given the information needed to express a Fuun's phenotype as 600 by 600 pixel images. Furthermore, we were provided with Endo's current »DNA« sequence, his current phenotype,

Endo’s source DNA

and a target phenotype

Target DNA for Endo

as suggested by his trusty and intelligent spacecraft Arrow. Our task was to synthesize a sequence that could be used as a prefix Endo's current genome, that would as closely »morph« Endo towards expressing this phenotype as possible.

Overall, I would say that this year's contest proved to be much more challenging than in previous years. However, I say this based primarily on the fact that (a) my team of two could not build a usable tool chain to start experimenting with Endo's genome within 48 hours and (b) the close of the closed the best team with public statistics had developed a sequence that would give Endo an 8% likelihood of survival. However, the overwhelming majority of the participants were well behind this. Not only was there the challenge of building a workable tool chain, but understanding Endo's genome well enough to reverse engineer it. I gather there were hints that became available once you had working tools, but I cannot really comment on their nature.

My team would have greatly benefited from additional minds to throw at the problem, but even so we did a few things sub-optimally. We initially divided the work at too high of a granularity. Brian start writing a DNA execution engine, while I built a tool for rendering the expressed phenotype. However, we should have concentrated more on writing smaller reusable components for the common structures to avoid duplication and make optimization easier.

The biggest problems we had were in debugging our DNA execution engine (or even figuring out whether it was buggy) and executing it efficiently enough that we could perform experiments. Brian's initial attempt was far too slow, though I can't remember exactly what structure he had chosen to use for representing sequences, other than it was partly based upon list. I then tried an approach that used an OCaml Bigarray to hold the maximal twenty-five megabyte working space for sequences during the simulation. However, this still proved to be too slow.

The next day, Brian tried some other approaches while I coded a new execution in OCaml from scratch, to have independent implementations for comparison and so that I could test some additional approaches to representing DNA sequences.Eventually I came to the conclusion that we taking a significant performance hit because of all the large copies we were performing. Knowing what sort of people the organizers were, I decided that perhaps the way they would have thought about the problem was to create a data type with constructors for each of the operations that the data needed to support and then begin working out algebraic laws for it. My initial version looked like

type seq = StringSeq of string
         | ConcatSeq of seq * seq
         | ConsSeq of base * seq
         | SnocSeq of seq * base
         | SliceSeq of seq * int * int

this proved promising, but eventually concluded that in practice ConsSeq was never really used. So I then switched to

type seq = StringSeq of string
         | ConcatSeq of seq * seq
         | SnocSeq of seq * base
         | SliceSeq of seq * int * int

By using this data type and taking advantage of the various algebraic properties I was able to come up with a implementations of concatenation, »snoc«, indexing, slicing, and computing the length of sequences that were reasonably efficient and worked without performing any copying (though I did introduce some rewrites that would copy a coalesce small StringSeq chunks). Incidentally, this turns out partly to have simply been reinventing the »rope« data structure that many participants found acceptably efficient.

However, the data structure did not account for the quotation operation on sequences. Attempting to quote instances of the seq data type in practice would require flattening the structure first, which would cause my implementation to die thrashing. Furthermore, I was unable to come up with a satisfactory method of treating quotation symbolically, because it does not commute with slicing.

In retrospect, and with an additional thirty-six hours behind me, I think I could have probably moved to a representation like

type seq = StringSliceSeq of string * int * int
         | ConcatSeq of seq * seq
         | SnocSeq of seq * base
         | QuoteSeq of seq

where QuoteSeq is lazily eliminated, and that would have probably resolved the problem. However, even then we would have had considerable challenges of ahead of us.

Still, I thought it was a really clever and interesting problem, and it was a refreshingly concrete problem to focus on after spending so much time on InforML and my dissertation.

Comments (3)

Brief observation

Beautiful Code incorrectly alphabetized Simon Peyton Jones when listing the contributors and in the index.


Admitting defeat

Brian and I have concluded that we have run out of ideas (and steam) towards continuing on the ICFP Programming Contest. Plus, I have a whole lot of dissertating left to do. But I'll write a more detailed postmortem after the contest deadline. While the organizers have not explicitly said anything this year, I assume that they would prefer that we do not discuss essential details of the contest and our solutions until then.

Update: I forgot to mention the highly ironic fortune I received while Brian and I conferred over Chinese: »You should be able to undertake and complete anything«. Maybe they had my dissertation in mind instead.

Further update: As this page was the first one linked in a recent e-mail to the contest discussion list, I figured I would add a link to the actual post-contest writeup.

Comments (2)

Endo in peril

I started from scratch on a new implementation of the DNA execution engine and am going to start on attempting to finish it now, but things aren't looking too promising for our team. We could have definitely used another person or two I think.  I guess we'll see.

Comments (1)

Status report

Some unknown benefactor fixed the Peter Buneman Memorial Coffee Machine.

Brian being the Coq expert concluded that he couldn't see any way that dependent types would help us at the moment, so we've pretty much been writing pure OCaml code. At this point we have what appears to be a working tool for checking the score of a particular output image, a renderer for the RNA command language that seems to work, and the DNA execution engine is being debugged. It is hard to conclusively test the RNA renderer until we can begin running the output from Endo's starting DNA. I might write a small high level language to make generating test RNA easier, but not just yet.

However, the hardest part is going to be reverse engineering the DNA they've provided us with.

If none of this makes any sense, you should go check out the ICFP 2007 Programming Contest task description.


Life support failure

Oh what a morning for the Peter Buneman Memorial Coffee Machine to break on us!


Morph Endo!

And we're off. Well, really we've kind of been at it since about five after six, but we've been in the office scrutinizing printed copies of the task since seven thirty. It looks like it will be pretty interesting. I think probably the best way to understand some of it is going to be to just jump in and start implementing.


ICFP Programming Contest 2007 (and more)

This year's ICFP Programming Contest starts this coming Friday at noon, Central European Time. This year it is being run by a group, mostly at Utrecht University. There is actually an official blog where they have been developing some of the back-story for this year's contest. So far it looks like it will just be Brian and I representing Penn PLClub this year. Currently our plan is to write our entry in Coq (with a smidgen of OCaml), However, given that we cannot really plan for what they'll throw at us, it could prove to be very cumbersome. I would have also liked to have given it a try in Scala, but with the POPL deadline, Brian really wouldn't have had much time to learn it.

In some tangentially related news:

  • Martin appeared (perhaps »was heard« would be more accurate) on Software Engineering Radio to talk about Scala. I haven't had a chance to listen to it yet, as it is nearly an hour long, but if I split it between the walk into the office and back tomorrow I should be able to get through it.
  • The Cult of the Bound Variable finally released their implementation and tools from last year's ICFP Programming Contest. I spent a few minutes poking around the directories and code, but there is enough to keep someone busy for quite some time trying to grok it all.


Those crazy sans serifs

I received the copy of Beautiful Code that I ordered today, and the colophon tells me that the cover is set using Akzidenz Grotesk.

Chung-chieh Shan noticed that in the photograph I took of the Helvetica and Akzidenz Grotesk comparison, that the legend text for Helvetica is set using Akzidenz Grotesk and the text for Akzidenz Grotesk in Helvetica. In a different comment he pointed out the rather useful website Identifont that will attempt to identify a typeface for you by essentially playing »twenty questions« with different attributes about the glyphs. It took seventeen questions to tell me that my current theme uses Lucida Sans (or two very closely related typefaces). Which I initially thought was wrong, because I could have sworn I had it configured for something like Optima, but I doubled checked the style sheets and it tells me:

font-family: "Lucida Grande", "Lucida Sans Unicode", Verdana, Arial, sans-serif;

So it seems it is doing a reasonable job, though I wasn't able to determine whether it is possible for Firefox to tell me what it is actually using to render the page. MacOS X Fontbook tells me that I do have Lucida Grande installed, so that is probably what it chose.


« Previous entries Next Page » Next Page »