## Fonts in LaTeX, an intermission

Part one of my tutorial attracted a considerable number of visitors, far more than any single entry in the past, partly because it was posted to reddit.

Looking at the comments on reddit, I figured that I would say that luatex does resolve pdfTeX's internal limitation of 256 glyphs that I mentioned in part two, and it should directly support OpenType fonts with PostScript outlines.

However, my understanding is that the authors of luatex do not intend to make using TrueType and OpenType fonts as simple as XeTeX directly.  Instead, luatex merely makes the machinery available for someone else to build upon.  So someone will need to write a LaTeX package for luatex to put it all together, and as far as I know, no one has done this yet (let me know if I'm wrong!).  Also, while the plan is for luatex to eventually be merged back into pdfTeX, I think it is an overstatement to say that it will happen "soon".  The current luatex roadmap says that a "production" ready version will be available in August 2009.  I doubt that the merge back to pdfTeX will happen any sooner than 2010 given that.  But yes, in the long term I think luatex will be a great thing.

It also sounds like I should probably write a fourth part to my tutorial on using fontinst.  I've never personally used it myself, and when I first started working with OpenType fonts and LaTeX I wasn't aware of its existence.  Therefore, I wrote otftofd.  So it might take a bit longer to write as I will have to learn it at the same time.

## Even better than the real thing

Several weeks ago I decided that instead of calling the core calculus that I have been working on Featherweight Scala, which could be confusing given that there is already a different calculus with that name, that I would call it Scala Classic.

I had hoped to submit a paper on Scala Classic to POPL 2009, but it has been taking too long to work out the metatheory. Partly because I am developing a mechanized proof of type soundness in Twelf, and on my first attempt at representing heaps I think tried to be too clever.  However, a more traditional treatment of heaps leads to the need to explicitly represent typing contexts, rather than the implicit treatment of contexts more commonly used in Twelf.

This afternoon I finally finished the proof of the theorem that is traditionally called "substitution".  However, the proof of type preservation will also require another theorem formalizing a substitution-like property for singleton types.  I am hoping that I can prove that theorem more quickly now that I've built up all the explicit context theory.  I have not thought much about whether there are any unusual theorems needed to prove progress.

In any event, I would ideally have the proofs finished by the beginning of August.  My current plan is to try submitting the paper to PLDI (which is even in Europe this year), unless some obviously better venue comes to my attention.

One of the reasons I would really like to have the proofs done by August is that I will be teaching a PhD level course here at EPFL this fall on mechanized sepcifications and proofs, with a strong emphasis on deductive systems. It also looks like I may fill in some for a masters level course based on TAPL.  So between those two things, I am not going to have nearly as much time for research as would be ideal.

## Fonts in LaTeX, Part Two: pdfTeX and OpenType

In part one of the tutorial, I commented that sometimes you would want to use pdfTeX and pdfLaTeX instead of XeTeX and XeLaTeX. One reason to consider using pdfTeX over XeTeX is that the latter does not yet support the same microtypographic features. When you are preparing slides, pdfTeX's microtypographic features probably will not have much of an impact on your output, but I've definitely found that while preparing articles and my dissertation, using pdfTeX's microtypographic features produces much nicer looking output with fewer bad breaks or hyphenations.

However, pdfTeX's architecture for handling fonts is much more like standard TeX and is far more complicated that XeTeX's. One option is to use a tool to do all the work for you.  For example, you could use the fontinst utility or my own tool, otftofd.  The other option is to do it all by hand, which is what I will explain in this tutorial.

One of the first complications you'll encounter with pdfTeX is that the font that is active at a given time can only refer to 256 glyphs at a time. Therefore if you need to use more than 256 different glyphs in a document, you will need to switch between multiple "fonts".

The first step in using a font in pdfTeX is picking an encoding. Since most OpenType fonts contain more than 256 glyphs, an encoding provides a mapping from those glyphs to the 256 that you can reference at a given time in pdfTeX.

For the most part I generally just use what is called the T1 or "Cork" encoding. However, if we want to replicate the example from the first part of the tutorial, we will need to make a custom encoding to access the Greek glyphs. So, first use kpsewhich to find where your system keeps cork.enc, and make a copy:
 % kpsewhich cork.enc /local/texlive/2007/texmf-dist/fonts/enc/dvips/base/cork.enc % cp /local/texlive/2007/texmf-dist/fonts/enc/dvips/base/cork.enc ./custom.enc 
Open custom.enc in you favorite editor, and go to the end. Assuming you are using the same version of TeX Live as me, the last few lines will look something like:
 /oslash /ugrave /uacute /ucircumflex /udieresis /yacute /thorn /germandbls ] def 
You will want to edit it to look like:
 /tau /epsilon /chi /ucircumflex /udieresis /yacute /thorn /germandbls ] def 
What we have done is changed the encoding so that glyphs 0xf8, 0xf9, 0xfa (in hexadecimal) now point to τ, ε, and χ. The general format of entries in the encoding file is / followed by a name. In the case that the software doesn't understand a name that you think it should, you can always specify the gylph using its Unicode hexadecimal name prefixed with /uni. For example, we could have changed the encoding as followings:
 /uni03c4 /uni03b5 /uni03c7 /ucircumflex /udieresis /yacute /thorn /germandbls ] def 

A complete list of glyph names can be obtained from Adobe's website.  You can learn more about the encoding file format from the dvips documentation, though the eagle-eye may have noticed that actually a subset of PostScript itself.

Next, we need to create a file to tell LaTeX about our new encoding, which we will call U for "user-defined". Create a file in the current directory called uenc.def and put the following in it:

 \ProvidesFile{uenc.def}\DeclareFontEncoding{U}{}{}

As it says, it is defining a new font encoding called "U".

Now that we have an encoding, we need to generate font metrics that pdfTeX can understand, and a mapping file to tell it how to map font names to encodings and actual font files. Additionally, pdfTeX (at least last I checked) cannot handle OpenType fonts that contain PostScript rather than TrueType font outlines. So we also need to convert our OpenType font, Pagella, to Type1 format. Fortunately, Eddie Kohler's excellent tool otftotfm will do most that for us. Again, it is included with TeX Live. We invoke it on the font we wish to use, with the encoding we have created, and redirect the output to a file called custom.map:

% otftotfm -e custom.enc texgyrepagella-regular.otf > custom.map
otftotfm: ./custom.enc:19: warning: 'space' has no encoding, ignoring ligature
otftotfm: ./custom.enc:19: warning: 'space' has no encoding, ignoring ligature
otftotfm: ./custom.enc:30: warning: 'space' has no encoding, ignoring '{}'
otftotfm: ./custom.enc:30: warning: 'space' has no encoding, ignoring '{}'
I had to round some heights by 13.0000000 units.
I had to round some depths by 3.0000000 units.
I had to round some heights by 13.0000000 units.
I had to round some depths by 3.0000000 units.


Don't be concerned about the warnings. The first few are just complaints because there is no "space" gylph, which is not used by TeX. The rounding warnings occur, I assume, because PostScript metrics differ very slightly from TeX's internal representation of size metrics. An otftotfm unit is about one thousandth of an em.

We now have have several new files in the current directory:

a_qnnnfc.enc
custom.map
TeXGyrePagella-Regular--custom--base.tfm
TeXGyrePagella-Regular--custom.tfm
TeXGyrePagella-Regular--custom.vf
TeXGyrePagella-Regular.pfb

The pfb file is the PostScript Type 1 version of our original OpenType font, the file custom.map is used to tell pdfTeX how to map a font name to files, the two tfm provide the font metric information TeX needs to format text, the vf file is a "virtual font" file that depending on the options you gave to otftotfm may perform some operations on the basic glyphs, and the file a_qnnnfc.enc is an encoding otftotfm generated based upon the encoding we supplied it. Depending on the options, otftotfm may try to include some additional glyphs to deal with ligatures or in the case that a glyph in the encoding we specified doesn't exist in the font, it will replace its entry with /.notdef, etc.

Next we want to take a peek inside of custom.map. It's contents will look something like the following:
 TeXGyrePagella-Regular--custom--base TeXGyrePagella-Regular "AutoEnc_qnnnfca3qut7llkesqq3eddyzc ReEncodeFont" <[a_qnnnfc.enc 
You can get away without understanding the structure of the map file, but we need know the name LaTeX should use to refer to the font. In this case it is the somewhat lengthy TeXGyrePagella-Regular--custom--base. We could edit custom.map to give it a different name, but then we would need to make sure to rename the tfm files appropriately. So we'll just leave it alone.

At this point we are ready to describe the font to LaTeX. To to this we'll create a file called UPagella.fd where fd stands for "font definition". Assuming you are using TeX Live, you can learn more about the format of font definition files by running: texdoc fntguide, which will bring up the LATEX 2ε font selection document. Put the following into UPagella.fd:

 \ProvidesFile{UPagella.fd}\DeclareFontFamily{U}{Pagella}{}\DeclareFontShape{U}{Pagella}{m}{n}{ <-> TeXGyrePagella-Regular--custom--base }{} \DeclareUnicodeCharacter{03C4}{\char"F8}\DeclareUnicodeCharacter{03B5}{\char"F9}\DeclareUnicodeCharacter{03C7}{\char"FA}

The second line declares for the font encoding U, a font family named Pagella. The third line defines an available shape for the Pagella family. It has a medium weight (m) and normal/upright (n), and for all sizes (<->) the font named TeXGyrePagella-Regular--custom--base should be used. The three \DeclareUnicodeCharacter lines map the Unicode glyphs for τ, ε, and χ to their locations in the encoding we defined. Note that the hexadecimal numbers must all be in uppercase for LaTeX to parse them correctly.

Now we are all set to revisit our original example. In test.tex enter:

 \documentclass{article}\usepackage[utf8]{inputenc}\usepackage[U]{fontenc}\pdfmapfile{+custom.map}\renewcommand{\rmdefault}{Pagella}  \begin{document}Testing pdfLaTeX! Greek: τεχ.\end{document}

The second line here tells LaTeX to load the inputenc package and pass it the option utf8 to tell it to parse the remainder of the input as UTF8 encoded text. The third line tells LaTeX to load the fontenc package and pass it the option U telling it to set the default encoding to be U. The fourth line is specific to pdfTeX and tells it to add to its internal mapping the definitions in custom.map. Finally, \renewcommand is used to change the default serif (Roman, rm) font to be Pagella.

We can now go ahead and run pdflatex:

% pdflatex test.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
%&-line parsing enabled.
entering extended mode
(./test.tex
...
...
(./test.aux) (./upagella.fd) [1] (./test.aux) ){a_qnnnfc.enc}<./TeXGyrePagella-
Regular.pfb>
Output written on test.pdf (1 page, 22850 bytes).
Transcript written on test.log.


Again, we now get a PDF with the desired output:

That concludes the second part of the tutorial. The third, and probably final, part of the tutorial will cover what needs to change in the above process if you would like to use a TrueType font rather than an OpenType font containing PostScript outline data.

## Fonts in LaTeX, Part One: XeLaTeX

Now and then I get asked about how to use some TrueType or OpenType font with LaTeX, so I figured I would take the time to write up some simple tutorials on how to do so. The first part will focus on the easiest route to making use of TrueType and OpenType fonts in LaTeX: XeTeX and XeLaTeX.

XeLaTeX also has the advantage of not only giving easy access to modern fonts, but also accepting Unicode input files.

The first thing you need to do is find out if you have XeLaTeX installed, and if it is a sufficiently up to date version. This is easiest to do from the command-line:
 % xelatex This is XeTeXk, Version 3.141592-2.2-0.996 (Web2C 7.5.6) %&-line parsing enabled. **^C 
This is the version that I am using for the tutorial, and is what comes with TeX Live 2007. I highly recommend just installing and using the entire TeX Live CD/DVD, even if you're using a Linux system that offers TeX Live packages, because, in particular for Debian/Ubuntu, I've found that the default installation often doesn't install some important packages, and it can be a pain sort through all the available packages using Synaptics or whatnot to find what it didn't install.

I am also assuming that you are using a (modern) Unix or MacOS X system. I assume that most of this material should also apply when using Windows, but if someone can comment, let me know.

Now, as an example, say you want to use the Pagella font from the TeX Gyre project. First download them and install the fonts (the otf files) as you normally would on your computer. Under MacOS X, this means using Font Book. If you double-click on an otf file it will load Font Book for you and there will be dialog with a button to install the font. If you load Font Book yourself, you can use the "Add Fonts..." menu item under the File menu to select the files. Under a modern Unix, I would recommend just placing the otf files in your ~/.fonts folder, though I think file managers like Nautilus also understand how to install fonts.

And that was all the installation work necessary; as I said, XeLaTeX is the easiest solution unless you have specialized needs. Now just create a small LaTeX document:

 \documentclass{article}\usepackage{fontspec}\setromanfont{TeX Gyre Pagella}\begin{document}Testing XeLaTeX! Greek: τεχ.\end{document}

The fontspec package isn't necessary, but it makes dealing with fonts in XeLaTeX much easier, for example it defines the convenient \setromanfont command. You can learn more about all of its great features from its beautifully formatted manual.

The other thing you might need to know is what XeLaTeX thinks your font is called. If you're using TeX Live, like I suggest, you will have the program otfinfo at your disposal that can do that for you:
 % otfinfo --family texgyrepagella-regular.otf TeX Gyre Pagella 
Note that despite its name, otfinfo will also work on ttf files, assuming that they include OpenType data in them. The other option is to use Font Book on MacOS X or fc-list from the command-line in Unix.

Now you just run xelatex:
 % xelatex test.tex This is XeTeXk, Version 3.141592-2.2-0.996 (Web2C 7.5.6) %&-line parsing enabled. entering extended mode (./test.tex ... ... Output written on test.pdf (1 page). Transcript written on test.log. 

I think that is about everything you need to know, but if you try this tutorial out and find that something doesn't work, let me know.  If you have more specialized or demanding typographical needs, you may want to use pdfTeX and pdfLaTeX, and part two of the tutorial will explain how to do the necessary configuration to use TrueType and OpenType fonts with them.

## Font identification

I've mentioned Identifont in the past, but I was just thinking that I should point out the existence of
WhatTheFont, a tool that will identify a font for you from an image. If failed to identify anything useful for the EPFL logo, but I doubt that corresponds to a real font. From a screen capture, it did correctly identify that the ∃xistential Type header at the top of the page is made from a member of the Warnock Pro family.

WhatTheFont was first brought to my attention on Daring Fireball.

## The other Scala

A post by James Iry on the Scala mailing list today reminded me of the existence of FF Scala (along with its companion FF Scala Sans). I haven't received confirmation, but I suspect that Programming in Scala does not, unfortunately,  use any members of the FF Scala family. I was sorely tempted to purchase the fonts, until I saw the prices. Maybe someday.

And there are plenty of other Scalas about. In San Francisco:

Alas, we did not succeed in getting Martin to eat there while he was at JavaOne. Photo courtesy of Robby Findler.

## Even more modules in Scala

In response to Rossberg's challenge to encode the following:

 signature S =
sig
structure A : sig type t; val x : t end
val f : A.t -> int
end
structure M =
struct
structure A = struct type t = int; val x = 666 end
fun f x = x
end
structure N = M :> S


In Scala, the above becomes:

 type S = {
val A: { type T; val x: T }
def f(arg: A.T): Int
}
val M = new {
val A = new { type T = Int; val x = 666 }
def f(arg: Int) = arg
}
val N = M : S


And using

N.f
on
N.A.x
is even well-typed, but still encounters the same problems with evaluation I mentioned earlier. And yes, the type
N.A.T
is hidden:

 scala> val x : Int = M.A.x
x: Int = 666
scala> val x : Int = N.A.x
<console>:7: error: type mismatch;
found   : N.A.T
required: Int
val x : Int = N.A.x
^
scala> val x : N.A.T = 42
<console>:7: error: type mismatch;
found   : Int(42)
required: N.A.T
val x : N.A.T = 42
^


As for how Scala, solves the double vision problem, I would look at the work on νObj, and the forthcoming paper on Featherweight Scala.

Update: Looking more closely at the definition of "double vision" as defined by Dreyer, the Scala implementation cannot currently handle his motivating example. I will need to think about how the example works in Featherweight Scala, which to my knowledge shouldn't have a problem with at least defining the type signatures. I think the short answer is that νObj and Featherweight Scala (the non-algorithmic version) solve the problem in a vacuous fashion, as the type checking problem for them is undecidable. Scala proper possibly could resolve the problem by using a less restrictive cycle detection algorithm.

## Functors in Scala

Following on my earlier entry on modules in Scala, I'll give an encoding of Standard ML style functors here. You can get a pretty close approximation by using class constructor arguments. However, I am going to cheat a little to get the closest encoding I think is possible by using the experimental support for dependent method types. You can get this by running scala or scalac with the option -Xexperimental. It works okay at least some of the time, but no one has the time at the moment to commit to getting it in shape for general consumption.

So here is my example of how the encoding works. First, the SML version:

 signature Eq = sig
type t
val eq: t -> t -> bool
end
signature RicherEq = sig
include Eq
val neq: t -> t -> bool
end
functor mkRicherEq(arg : Eq) :> RicherEq where type t = arg.t = struct
type t = arg.t
val eq = arg.eq
fun neq x y = not (eq x y)
end


We can transliterate this example into Scala as:

 type Eq = {
type T
def eq(x: T, y: T): Boolean
}
type RicherEq = {
type T
def eq(x: T, y: T): Boolean
def neq(x: T, y: T): Boolean
}
def mkRicherEq(arg: Eq) : RicherEq { type T = arg.T } = new {
type T = arg.T
def eq(x: T, y: T) = arg.eq(x, y)
def neq(x: T, y:T) = !eq(x, y)
}


The only problem I discovered is that it is not possible to define RicherEq in terms of Eq as we could in SML:

 scala> type RicherEq = Eq { def neq(x: T, y: T): Boolean }
<console>:5: error: Parameter type in structural refinement may
not refer to abstract type defined outside that same refinement
type RicherEq = Eq { def neq(x: T, y: T): Boolean }
^
<console>:5: error: Parameter type in structural refinement may
not refer to abstract type defined outside that same refinement
type RicherEq = Eq { def neq(x: T, y: T): Boolean }
^


Why this restriction exists I don't know. In fact, this sort of refinement should work in the current version of Featherweight Scala, so perhaps it can be lifted eventually.

I still need to think about higher-order functors, and probably spend a few minutes researching existing proposals. I think this is probably something that cannot be easily supported in Scala if it will require allowing method invocations to appear in paths. However, off hand that only seems like it should be necessary for applicative higher-order functors, but again I definitely need to think it through.

## Modules in Scala

I just saw a thread on Lambda the Ultimate where I think the expressive power of Scala in comparison to Standard ML's module system was misrepresented. I don't want to go into all of the issues at the moment, but I figured out would point out that you can get the same structural typing, opaque sealing, and even the equivalent of SML's where type clause.

For example, consider the following SML signature:

 signature Nat = sig
type t
val z: t
val s: t -> t
end


This signature can be translated in to Scala as:

 type Nat = {
type T
val z: T
def s(arg: T): T
}


It is then possible to create an implementation of this type, and opaquely seal it (hiding the definition of T). In SML:

 structure nat :> Nat = struct
type t = int
val z = 0
fun s n = n + 1
end


In Scala:

 val nat : Nat = new {
type T = Int
val z = 0
def s(arg: Int) = arg + 1
}


In many cases when programming with SML modules it is necessary or convenient to give a module that reveals the definition of an abstract type. In the above example, this can be done by adding a where type clause to the first line:

 structure nat :> Nat where type t = int = struct
...


We can do the same thing in Scala using refinements:

 val nat : Nat { type T = Int } = new {
...


Great, right? Well, almost. The problem is that structural types are still a bit buggy in Scala compiler at present. So, while the above typechecks, you can't quite use it yet:

 scala> nat.s(nat.z)
java.lang.NoSuchMethodException: $anon$1.s(java.lang.Object)
at java.lang.Class.getMethod(Class.java:1581)
at .reflMethod$Method1(<console>:7) at .<init>(<console>:7) at .<clinit>(<console>) at RequestResult$.<init>(<console>:3)
at RequestResult$.<clinit>(<console>) at RequestResult$result(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflec...


There were some issues raised about how faithful an encoding of SML functors, and well-known extensions for higher-order functors, one can get in Scala. Indeed, off the top of my head it is not entirely clear. So I need to think more about that before I write some examples.

## Featherweight Scala lives

With everything else I've been having to deal with the past couple months, I have not made nearly as much progress on Featherweight Scala as I would have liked. Fortunately, in the past two weeks I have finally started to get back on track.

Probably what makes working out the design of Featherweight Scala so difficult is that I am trying my best to commit to it being a valid subset of Scala. It seems like whenever I try to simplify one part of the language, it just introduces a complication elsewhere. Still, I think so far that I have been able to ensure that it is a subset with two exceptions.

The first exception is that some programs in Featherweight Scala will diverge in cases where the Scala compiler will generate a terminating program. This is because in programs that attempt to access an uninitialized field, Scala will return null, while Featherweight Scala diverges. Since the program many never attempt to use the null value, it may simple terminate without raising a NullPointerException.

The second exception arises because type checking in Featherweight Scala's type system is most likely undecidable. Given that the design is in flux, I haven't precisely verified this, but it seems quite plausible given its expressive power is similar to languages where type checking is known to be undecidable. Because type checking is undecidable, and the Scala language implementation attempts to have its type checking phase be terminating, there should be some programs that can be shown to be well-typed in Featherweight Scala that the Scala compiler will reject.

I think the other thing I've determined is that I do not understand F-bounded quantification as well as I thought I did. At least, I find myself worried about the cases where the typing rules presently require  templates, that have not yet been verified to be well-formed, to be introduced as hypotheses while verifying their own correctness. So I need to do some additional reading to see how some other languages deal with this issue. Certainly I can put my mind at ease by making the type system more restrictive, but I do not want to be so restrictive that it is no longer reasonable to call the language Featherweight Scala.

Update: Okay, so after looking at Featherweight Generic Java and νObj again more closely, they both do essentially the same kind of thing. In FGJ, the class table is well-formed if all its classes are well-formed, but checking a class's well-formedness may require presupposing that that class tale is well-formed. In νObj, the well-formedness of recursive record types is checked under the assumption that the record's self-name has the type of the record. Certainly the same sort of things come up when verifying the well-formedness of recursive functions and recursive types, but at least there what is being assumed is at a higher level of abstraction: when checking a recursive function you assume that the function has some well-formed type, or when checking a recursive type you assume that the type has some well-formed kind.