Archive for languages

Lazy environment variables

I was just thinking that it would be really useful if command-line shells supported lazy environment variables. Lately, because of my work on Scala I will often find myself entering a line something like

 
export PATH=/home/linuxsoft/apps/java-ibm-1.5/bin:$PATH
...

This is, despite the write-once promises of Jave (well, JVM bytecode), Scala will fail to build or a test will fail on specific implementations of the Java runtime and VM. I have been doing this so frequently, I finally decided to write some ZSH shell scripts to make it a little less work.

Just having a short macro that does the above for all the various Java runtimes is not ideal, because then my PATH keeps getting longer and longer. ZSH might be smart about this when caching lookups, but it is inelegant. Another solution is to write something that does a search and replace on my PATH as a string. However, the most elegant solution would simply be to not perform expansion on the contents of PATH until it must be passed as part of an exec

ZSH can do a lot, so maybe it already has some feature that approximates this, but it would be nice if I could just write something like

 
lazy export PATH=$JAVA_BIN:$PATH
export JAVA_BIN=/home/linuxsoft/apps/java-ibm-1.5/bin
...

And then my scripts can just operate on JAVA_BIN rather than having to modify PATH directly.

Update: I just noticed that setting the variable JAVACMD is enough for most purposes, but the above concept still seems reasonable.

Comments (2)

Focusing and higher-order abstract syntax

I figured I would take a few minutes to point out that Noam Zeilberger had an excellent paper in POPL this year on Focusing and higher-order abstract syntax. I've spent some time in the past trying to better understand Andreoli's material on focusing and Girard's* Ludics from the judgmental perspective that I became accustomed to in my formative years, but I never really came up with anything. I found Noam's computational interpertation of focusing quite pleasing.

One low tech solution that has really been helping my wrists lately is that I obtained adjustable height desk for my office. The desks that came with my office were definitely far too tall for me (even when my chair at its maximum height).

* Girard seems to have redecorated his website since I last visited.

Comments

More notes on Featherweight Scala

As I mentioned the other day, Featherweight Scala has a typing rule that allows assigning a path (x.l1...ln) a singleton type:

Γ ⊦ p : T
-------------
Γ ⊦ p : p.type

(I should note that for some stupid reason WordPress seems to prevent MathML from working, making the rather pleasant <mfrac> tag useless). This is essentially the rule that everyone else uses, but in all the other literature I've looked at (other than νObj) the singleton is labeled with T (something like ST(p)) or T must have a specific type (int, base, etc.).

As I pointed out, Featherweight Scala's version of the rule makes it difficult to write meaningful inversion lemmas. Secondly, the naïve subject reduction theorem that could be obtained would be something like:

If Γ ⊦ t : T and t → t' then exists some T' such that Γ ⊦ t' : T'.

Which feels rather unsatisfactory. Given the subsumption rule for singletons

Γ ⊦ p : T
---------------
Γ ⊦ p.type <: T

It is plausible that a variant of the theorem where T is a subtype of T' would hold, but that would run counter to the expectations that reduction will only allow us to more precisely characterize the type of result. Thinking about it I suppose it might be reasonable to state the theorem as

If Γ ⊦ t : T and t → t' then there exists some T' such that Γ ⊦ t' : T' and
if there exists some p such that T = p.type then Γ ⊦ p : T', otherwise Γ ⊦ T' <: T.

Which I am starting to believe is the best that could be proven. This theorem nicely captures the discontinuities in subject reduction introduced by singleton types. You keep making steps getting a more and more precise type, and then suddenly you "jump" from one type to another, followed by some period where the type becomes more refined, then another "jump", and so forth.

These discontinuities are quite a problem, because it means that after taking a step it may not be possible in some cases to put the resulting term back into an evaluation context and remain well-typed.

νObj [1], the predecessor to Featherweight Scala, has nearly the same problem, but the language has been constructed just right so that typing always becomes more precise. The same changes cannot be made to Featherweight Scala and retain the property that the language is a subset of Scala. Aspinall's original work [2] on on singleton types makes use of an erasure predicate, but his singletons are labeled with a type making its definition trivial. Stone and Harper's work on singleton types [3] solves the problem by using a subject reduction theorem that looks roughly like

If Γ ⊦ t : T and t → t' then that Γ ⊦ t = t' : T.

but I am not sure there is a pleasant notion of term equality in Featherweight Scala, because all reductions either consult the heap or extend it. Singletons are also slightly more tractable in their setting because they require any term being used as a singleton type to have the base type "b".

[1] Martin Odersky, Vincent Cremet, Christine Röckl, and Matthias Zenger. A Nominal Theory of Objects with Dependent Types. Tech report 2002.
[2] David Aspinall. Subtyping with singleton types. CSL 1994
[3] Christopher Stone and Robert Harper. Extensional equivalence and singleton types. ACM TOCL 2006.

Comments (1)

On Arc

From Paul Graham's Arc release annoucement:

Over the years my appreciation for lists has increased. In exploratory programming, the fact that it's unclear what a list represents is an advantage, because you yourself are unclear about what type of program you're trying to write. The most important thing is not to constrain the evolution of your ideas. So the less you commit yourself in writing to what your data structures represent, the better.

Clearly Paul has been living in a cave for the past thirty-five years and did not hear about those quaint ideas, abstract data types and representation independence.

He seems to miss the obvious that using a list to represent a point is quite constraining because you are forced to require one coordinate be the first element of the list and the other coordinate be the second element of the list, and if you ever want to change it, good luck. I suppose his rebuttal would be »oh, it just takes too long or too many tokens to define an ADT for points«.

I really wonder how long it is going to take people to truly learn that telling the compiler about the structure of the data you work with is just as important as telling it about the control flow.

Comments (7)

Prognostication

There is a current thread on Lambda the Ultimate at the moment on predictions for 2008. I will have to remember to look at it again later this year.

Most of the predictions concerning Scala are positive.

I would have to agree with Sean that there have been many years I've heard that concurrency will be the big thing, but I do not feel there has been too many big developments in the area recently. Robin Milner has recently released a draft book on his theory of bigraphs, his newer generic model for concurrency. There is actually an even newer draft of the book, but that is not presently linked from his website, so I am not sure whether I should publish the link.

Comments

Research

I gave a mild promise of more programming languages related material after depositing my dissertation, but so far I have not really come through on that.  I am not sure how much I will cover in this particular posting, but we'll see.

Firstly, I am have been merging material on generalized parametricity that I revised and expanded, as part of my dissertation, back into a journal version of my LICS 2005 paper.  My plan current plan is to submit it to LMCS.

I have just barely started on creating a Coq formalization of Featherweight Scala. It is something that could probably done in Twelf, but I am trying to believe that Coq is more modular at the moment. However, I am currently feeling somewhat disillusioned because I am told that defining part of an AST like the following

Inductive exp : Set :=
...
| exp_tuple : list exp -> exp
...

where list is the list data type that is part of Coq basis library, will not allow one to prove the sorts of theorems one would like. Instead, I am told I should write a mutually inductive AST, something like the following

Inductive exp : Set :=
...
| exp_tuple : exp_list -> exp
...
with exp_list : Set :=
| exp_list_nil : exp_list
| exp_list_cons : exp -> exp_list -> exp_list.

Doing things this way almost negates the benefits one would assume you get from having polymorphism in Coq. Now I am told for the latter definition it is possible to make clever use of Coq's coercions to convert exp_lists back and forth between lists, but I'll have to see how well this works for Featherweight Scala. Maybe, maybe Delphin will be usable someday.

My main project so far other than getting settled in, dealing with Swiss bureaucracy, and preparing a talk on generalized parametricity for the Tresor seminar, has been to start thinking about and to soon start implementing virtual classes for Scala. Virtual classes are kind of strange things, I did not really understand them until I started working on the problem here at EPFL. The idea is basically that you define some »module« class

class A {
  class Inner <: { val x : Int = 42 }
}

with an inner class, and then you may inherit from the module class and override the inner class. The idea is for that to look something like the following in Scala

class B extends {
  class Inner <: { val y : Int = -42 }
}

Here B.Inner has both a member x and a member y. Things become more interesting when you include inheritance from the inner class

class A {
  class Inner <: { val x : Int = 42 }
  class Innie extends Inner { val z : Int = 0 }
}
class B extends {
  class Inner <: { val y : Int = -42 }
}

Now B.Innie also has a member y because we have overridden its parent class. The idea is that for Scala, this kind of behavior can just be treated as complex syntactic sugar for abstract type members and trait composition. However, it is getting late here, so I'll talk more about that some other time.

Comments (6)

In Helvetica now

I have many things I would like to write about, but I figured I would take a few seconds to express my annoyance that because my hostname now ends in ».ch«, Google automatically assumes that I want my interface and searches to be in German by default.  Which I  probably also really annoys the third of the country where German is not the primary language.

Comments (10)

I come bearing links

While I have been in hiding, toiling on my dissertation, a rather large number of links have been accumulating in my to-do list. I'll hopefully write more on the status of my dissertation soon.

Software

A few weeks ago I became frustrated with the capabilities of FontBook on Mac OS X and Gnome Specimen and gucharmap under X11. I spent some time searching and came across Linotype's FontExplorer X. My best description of it would be iTunes for fonts. Not only does it let you manage, examine, and compare the fonts you do have, it lets your preview and purchase fonts from Linotype.

I also came across the UnicodeChecker application (again MacOS X only) which perhaps provides a more detailed examination and exploration of the properties of Unicode glyphs.

Reference

I'm not sure where I read about it, but I recently came across Symbols.com, "The Online Encyclopedia of Western Signs and Ideograms". I haven't needed to consult it yet, but it is fairly interesting just to see what it gives you when asking for a random sign.

I've known about this for a while, but didn't think to mention it until now: the entirety of the Fifteenth edition of the Chicago Manual of Style can be found online. Useful when you don't want to carry around the hardcopy, and searchable. Unfortunately, looking at it again now, the fact that it says in the corner »University of Pennsylvania« makes me believe that is this probably not a free service. And indeed, I just verified this with my desktop, which does not have a cached cookie from inside the Penn network.

Theory

From Lambda the Ultimate, I learned about a paper giving a functional description of the »galleys« layout algorithm used by the lout document processing system.

This also reminded me of the earlier post, also by Neel, on a functional description of TeX's formula layout algorithm.

I also went looking and Kingston has a nice paper on the design of lout, and website on the eventual successor to lout, Nonpareil. lout is an interesting system, though for some reason when I first encountered it many years ago I didn't seem to think it produced as nice of output as TeX. I haven't tried to reevaluate this at the moment, and see whether it has changed or I was simply wrong in the first place.

Also, something else on Lambda the Ultimate led to me to discover the Juno-2 structured drawing editor. I've only had a chance to read about it briefly, but it sounds quite a bit like what I have envisioned for the font editor I've been thinking about for a while, but haven't found the time to seriously start yet.

Finally, while writing this section, I came across another Lambda the Ultimate post on Skribe, a Scheme based document formatting system. It however it seems to be more of a macro layer on top on HTML and LaTeX, rather than a fully integrated system.

Comments (2)

Linkage dump

From Daring Fireball:

Typographica has some discussion on a new serifed typeface Gina.

I wish I had known about, OcaIDE, a very promising Eclipse plugin for OCaml before starting the ICFP Programming Contest. If nothing else, it would have saved a bunch of time looking up standard library functions on the web.

Comments (3)

Programming contest postmortem

For those who haven't investigated this year's ICFP Programming contest, I'll provide a brief overview. We were given the task of saving Endo, an alien that crash landed on Earth and needed his »DNA« to be rewritten into a form that would allow him to survive the conditions on our planet. Endo, being a member of the Fuun species has a decidedly non-terrestrial biological basis.

Endo's »DNA« consists of a sequence of the bases I, C, F, and P. It has the most usual property that it acts both as an information storage mechanism and as catalyst for reactions. So perhaps the existence of the Fuun support the RNA world hypothesis, as we could say that their »DNA« does the same job as both our DNA and RNA. The catalytic process produces new sequences of »DNA« that provide the instructions for building a Fuun. We were given the information needed to express a Fuun's phenotype as 600 by 600 pixel images. Furthermore, we were provided with Endo's current »DNA« sequence, his current phenotype,

Endo’s source DNA

and a target phenotype

Target DNA for Endo

as suggested by his trusty and intelligent spacecraft Arrow. Our task was to synthesize a sequence that could be used as a prefix Endo's current genome, that would as closely »morph« Endo towards expressing this phenotype as possible.


Overall, I would say that this year's contest proved to be much more challenging than in previous years. However, I say this based primarily on the fact that (a) my team of two could not build a usable tool chain to start experimenting with Endo's genome within 48 hours and (b) the close of the closed the best team with public statistics had developed a sequence that would give Endo an 8% likelihood of survival. However, the overwhelming majority of the participants were well behind this. Not only was there the challenge of building a workable tool chain, but understanding Endo's genome well enough to reverse engineer it. I gather there were hints that became available once you had working tools, but I cannot really comment on their nature.

My team would have greatly benefited from additional minds to throw at the problem, but even so we did a few things sub-optimally. We initially divided the work at too high of a granularity. Brian start writing a DNA execution engine, while I built a tool for rendering the expressed phenotype. However, we should have concentrated more on writing smaller reusable components for the common structures to avoid duplication and make optimization easier.

The biggest problems we had were in debugging our DNA execution engine (or even figuring out whether it was buggy) and executing it efficiently enough that we could perform experiments. Brian's initial attempt was far too slow, though I can't remember exactly what structure he had chosen to use for representing sequences, other than it was partly based upon list. I then tried an approach that used an OCaml Bigarray to hold the maximal twenty-five megabyte working space for sequences during the simulation. However, this still proved to be too slow.

The next day, Brian tried some other approaches while I coded a new execution in OCaml from scratch, to have independent implementations for comparison and so that I could test some additional approaches to representing DNA sequences.Eventually I came to the conclusion that we taking a significant performance hit because of all the large copies we were performing. Knowing what sort of people the organizers were, I decided that perhaps the way they would have thought about the problem was to create a data type with constructors for each of the operations that the data needed to support and then begin working out algebraic laws for it. My initial version looked like

type seq = StringSeq of string
         | ConcatSeq of seq * seq
         | ConsSeq of base * seq
         | SnocSeq of seq * base
         | SliceSeq of seq * int * int

this proved promising, but eventually concluded that in practice ConsSeq was never really used. So I then switched to

type seq = StringSeq of string
         | ConcatSeq of seq * seq
         | SnocSeq of seq * base
         | SliceSeq of seq * int * int

By using this data type and taking advantage of the various algebraic properties I was able to come up with a implementations of concatenation, »snoc«, indexing, slicing, and computing the length of sequences that were reasonably efficient and worked without performing any copying (though I did introduce some rewrites that would copy a coalesce small StringSeq chunks). Incidentally, this turns out partly to have simply been reinventing the »rope« data structure that many participants found acceptably efficient.

However, the data structure did not account for the quotation operation on sequences. Attempting to quote instances of the seq data type in practice would require flattening the structure first, which would cause my implementation to die thrashing. Furthermore, I was unable to come up with a satisfactory method of treating quotation symbolically, because it does not commute with slicing.

In retrospect, and with an additional thirty-six hours behind me, I think I could have probably moved to a representation like

type seq = StringSliceSeq of string * int * int
         | ConcatSeq of seq * seq
         | SnocSeq of seq * base
         | QuoteSeq of seq

where QuoteSeq is lazily eliminated, and that would have probably resolved the problem. However, even then we would have had considerable challenges of ahead of us.

Still, I thought it was a really clever and interesting problem, and it was a refreshingly concrete problem to focus on after spending so much time on InforML and my dissertation.

Comments (3)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »