Archive for June, 2007

Clippy knows Scala

Someone suggested on the scala list today that a tool be written for analyzing your code and suggesting advanced language features you could be using to improve your code. My first thought was

Clippy does Scala

Seriously though, doing global type inference for language as expressive as Scala would be a challenge. Trying to also recognize patterns and idioms in the code, while I won't say it is impossible, would be a Herculean task.


Ask Geoff: space used by fonts embedded in PDFs

I get e-mails now and then about typography related things, so I figured I would try putting my responses (or revised versions) here for general reference.

Is there a way to determine how much each particular font embedded in a PDF file contributes to the file's size?

There are a few tools that will tell you information about fonts in a PDF. The pdffonts, which is part of xpdf, seems like the most informative but it does not tell you about how much space an embedded font uses up. Acrobat Reader will also tell you a little bit about the fonts that are embedded in a PDF, but again I couldn't find anything about space used. Even the profession version did not seem offer this information, which seems kind of strange given the various optimization tools it provides.

pdffonts does tell you the object number of each font in the PDF, so if you could extract a specific object from a PDF it would be possible to write wrapper that would tell you how much space the embedded fonts are using. However, I spent a few minutes investigating and did not find any really easy simple way to extract PDF objects. By really easy I mean by a command-line tool. I have to imagine that there is probably something in CPAN that has an API for extracting PDF objects. There is pdftosrc, part of most modern TeX distributions, that will extract PDF »stream objects«, but these seem to be different than just plain »objects«.

Comments (2)

A sign?

I've kept meaning to write about this.

Last Monday (Memorial Day in the United States) I was in the office to spend yet another day hacking on InforML. When I opened the »sleeve« that I keep my ThinkPad in, to protect it within my backpack, a grayish/white moth flew out of it.


Totally explicit

Some of you may have looked closely enough at the dump from InforML I posted a few weeks back and noticed that not only was I using de Bruijn Indices as I said I was switching to in April, but that I was also using explicit substitutions. After some amount of debugging and complications with how I had converted my named mixed-prefix unification algorithm over to de Bruijn Indices, I concluded that it might be best to go with something closer to the literature.

So I started revising my code to work something like Dowek, Hardin, Kirchner, and Pfenning's Unification via explicit substitutions: The case of higher-order patterns. However, I soon found that de Bruijn Indices become vastly more complicated to work with in a polymorphic setting. Using explicit substitutions did make it easier to track down a number of bugs, because substitution could be traced as part of the type normalization. Still it was a long hard process of moving forward.

So about two weeks ago I decided that global inference is really only a bonus in the overall scheme of what my dissertation is trying to show. Not entirely coincidentally this coincided with having a much stronger deadline commitment. So I scrapped inference and began working on just getting a bidirectional type-checker working for InforML, and now I nearly have the entire language type-checking. So one lesson here is to always write a type-checker first. I kind of am aware of this rule, but given the way I was attempting to start from AspectML, which doesn't have a type-checker for the source language, I wound up directly starting on inference. However, at this point I've probably rewritten the code such that barely anything from AspectML remains anyway.

In addition to tackling the easier problem of type-checking versus global type-inference, I was able to expedite the implementation by taking advantage of the Twelf Logical Framework. Specifically, a number of decision problems required by type-checking, most importantly polymorphic subsumption (which requires unification), are converted into Twelf logic programs/queries and then evaluated using the tabled logic programming engine. This simplifies many things considerably. However, this has the drawback that ill-typed programs may very well cause Twelf to diverge in search of a proof that doesn't exist, not to mention error messages that have no bearing on the input program if it does terminate without finding a solution. I think I may be able to improve the termination behavior by making the specification of some parts of the InforML language more algorithmic, but that will probably have to wait.

The next big step is to write a simple compiler for InforML that will spit out Scheme code. This means that I can avoid writing my own evaluator and implementing some of the primitive operations will be easier than they would be if I implemented as part of an interpreter. I'm hoping that getting this compilation step mostly working won't take more than a few days, which means that I can then focus on writing examples and dissertating.

I may take a little time to try implementing some local type inference just so that I can avoid writing an number of terribly obvious type annotations, but again I might just live with it if it doesn't get too out of hand while writing examples.


The latest in »cat macro expressibility«

It seems like »cat macro expressibility« is really taking off. Three days after originally circulating my abstract internally, Diggins in his own independent research released version 0.14 of the Cat language with a macro facility. Three days after that, also independently, Lindsay unveiled a project to develop a language based upon the principles of cat macro expressibility. This may very well be the hot topic for POPL 2008!