Packrat parser dependency analysis memoization optimization
Bryan Ford has written some nice papers on PEG parsers and the packrat algorithm. (Wikipedia on parsers.) In a nutshell, the packrat algorithm for implementing parsers for PEG grammars achieves unlimited lookahead in an input stream with reasonable resources by memoizing the results of specific grammar rules (storing the result of a rule at individual points in the input stream so that the rule is calculated at most once per input position, and usually much less). In his thesis, in section 4.4.2, he lists (among others) an optimization that helps alleviate the large memory requirements of packrat parsers. By analyzing a PEG grammar before using it to parse an input (or before generating a parser) it can be determined that some rules will not call others recursively, and thus can be de-memoized with the parser remaining O(n) (though a bit slower).
This is a good start. But another optimization I don’t see mentioned anywhere (possibly because it doesn’t apply to a parser implemented in Haskell) is to perform an analysis on rules to automatically determine backtracking scopes. Once certain rules complete sucessfully (usually medium level rules, like “statement” in grammar for a C-like language) it should be possible to guarantee that a whole class of lower-level ones (lexing rules, often) will not be called again on a position inside of the text included in the high-level rule. By grouping rules’ memoization into related structures, it should be possible to easily and quickly free large block of intermediate results as these higher-level statements are matched.
This analysis would be pretty obvious in certain places for a human to give hints about. Whether it’s possible for the parser (or parser compiler) to perform this analysis usefully, I’m not sure.
November 2nd, 2006 at 13:45
I hope you’re still alive, and just busy.