<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Parser generators considered harmful?</title>
	<atom:link href="http://existentialtype.net/2006/11/05/parser-generators-considered-harmful/feed/" rel="self" type="application/rss+xml" />
	<link>http://existentialtype.net/2006/11/05/parser-generators-considered-harmful/</link>
	<description>For People Who Like Type and Types</description>
	<pubDate>Tue, 06 Jan 2009 08:35:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: kitby</title>
		<link>http://existentialtype.net/2006/11/05/parser-generators-considered-harmful/comment-page-1/#comment-548</link>
		<dc:creator>kitby</dc:creator>
		<pubDate>Sun, 05 Nov 2006 23:11:35 +0000</pubDate>
		<guid isPermaLink="false">http://existentialtype.net/?p=73#comment-548</guid>
		<description>There might also be question of how difficult your language is to parse for actual people.</description>
		<content:encoded><![CDATA[<p>There might also be question of how difficult your language is to parse for actual people.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aaron</title>
		<link>http://existentialtype.net/2006/11/05/parser-generators-considered-harmful/comment-page-1/#comment-542</link>
		<dc:creator>aaron</dc:creator>
		<pubDate>Sun, 05 Nov 2006 20:00:22 +0000</pubDate>
		<guid isPermaLink="false">http://existentialtype.net/?p=73#comment-542</guid>
		<description>Just a couple of remarks:

I think most k&#62;1 parsers use a more restricted notion of lookahead.  For ml-antlr, lookahead is not a k-tuple of tokens, but rather a single token that may appear up to k tokens ahead.  Well, that's roughly the story, anyway -- in reality things are a bit more complicated.  But, at any rate, this technique gives you something like size O(n*k) for your lookahead decision.  In practice, this works quite well, and you usually need only k = 3 or so.  We provide selective backtracking when you need more power.

More broadly, depending on your error handling technique, I don't think the lookahead strategy has much of a bearing.  For ml-antlr, we use Burke-Fisher error repair, which is the same strategy used in ml-yacc.  Basically, when an error is detected, the parser "backs up" about, say, 20 tokens, and tries EVERY single token change possible in those 20 tokens: insertions, deletions, and substitutions.  For each attempt, it sees how much farther the parser gets, and chooses the best correction according to some heuristic.  Although this sounds very expensive, it really isn't very bad, and the penalty is only incurred when there is an error and you're willing to spend time analyzing it anyway.  This technique is essentially parser-agnostic: all it's doing is using the parser to try various permutations of the input.

A yet broader point.  One thing that I think is tremendously hard to do with a handwritten scheme is "global", or at least nonlocal, error recovery.  The technique I just described back up ~20 tokens because it is often the case that the "real" error is some distance behind the point at which it was detected.

Finally, ml-antlr is in some respect a meta-language for writing recursive-descent parsers, which is what you'd be writing by hand anyway.  The code it generates is, as a result, actually fairly readable.  Furthermore, this approach makes it possible to get the best of both worlds: you can add support in the ml-antlr meta-language for custom error handling, and very easily include that in the generated code.  This is, in fact, exactly what Parr's antlr tool does.  We don't yet support this in ml-antlr, but it is definitely planned.

So, I think parser generators should be considered helpful -- but, obviously, I'm rather biased in that respect.  :-)</description>
		<content:encoded><![CDATA[<p>Just a couple of remarks:</p>
<p>I think most k&gt;1 parsers use a more restricted notion of lookahead.  For ml-antlr, lookahead is not a k-tuple of tokens, but rather a single token that may appear up to k tokens ahead.  Well, that&#8217;s roughly the story, anyway &#8212; in reality things are a bit more complicated.  But, at any rate, this technique gives you something like size O(n*k) for your lookahead decision.  In practice, this works quite well, and you usually need only k = 3 or so.  We provide selective backtracking when you need more power.</p>
<p>More broadly, depending on your error handling technique, I don&#8217;t think the lookahead strategy has much of a bearing.  For ml-antlr, we use Burke-Fisher error repair, which is the same strategy used in ml-yacc.  Basically, when an error is detected, the parser &#8220;backs up&#8221; about, say, 20 tokens, and tries EVERY single token change possible in those 20 tokens: insertions, deletions, and substitutions.  For each attempt, it sees how much farther the parser gets, and chooses the best correction according to some heuristic.  Although this sounds very expensive, it really isn&#8217;t very bad, and the penalty is only incurred when there is an error and you&#8217;re willing to spend time analyzing it anyway.  This technique is essentially parser-agnostic: all it&#8217;s doing is using the parser to try various permutations of the input.</p>
<p>A yet broader point.  One thing that I think is tremendously hard to do with a handwritten scheme is &#8220;global&#8221;, or at least nonlocal, error recovery.  The technique I just described back up ~20 tokens because it is often the case that the &#8220;real&#8221; error is some distance behind the point at which it was detected.</p>
<p>Finally, ml-antlr is in some respect a meta-language for writing recursive-descent parsers, which is what you&#8217;d be writing by hand anyway.  The code it generates is, as a result, actually fairly readable.  Furthermore, this approach makes it possible to get the best of both worlds: you can add support in the ml-antlr meta-language for custom error handling, and very easily include that in the generated code.  This is, in fact, exactly what Parr&#8217;s antlr tool does.  We don&#8217;t yet support this in ml-antlr, but it is definitely planned.</p>
<p>So, I think parser generators should be considered helpful &#8212; but, obviously, I&#8217;m rather biased in that respect.  <img src='http://existentialtype.net/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
