In my (super-dandy, hyper-extensible, solve-world-hunger and kick-Saddam-in-the-nuts) StructuredText parser (version 2) which I've been working on lately, the last main thing I have to do before I can actually use the darn thing is get the list syntax to work. I've been having problems specifying to the parser exactly how the lists should be parsed, and I realized today that there's a weird case that's been causing my problems. Here's the rundown:
The lists in my previous version of my StructuredText parser looked like this:
* List
** Nested list
*** Another nested list
Not too bad, but it's klunky. It's more natural to me to write the following:
* List
* Nested list
* Another nested list
That's two spaces before each nested list starts. I haven't worried about other kinds of lists (like ordered lists) yet.
This is easy to parse in my ST parser. Next, I wanted to be able to have multiple lines in my lists. So that would look like this:
* List
* This is a
nested list
No problem. Some various cases are the following:
* List with
continuing text
No problem. If you happen to mess up and put one space instead of two:
* List with
continuing text
it would figure out what you meant. If you messed up and didn't put any spaces, the list would end and it would consider that line the start of a paragraph. If you had any amount of spaces before the text began:
* List with
continuing text
the parser would strip the first two spaces and the line would continue with no problem. So far so good. Even this would work with no problem:
* List with
Continuing text
* Sub list
with continuing text
and so on. The problem arises in two specific cases:
* List
* Sublist
oops, not indented right
and
* List
* sublist
still not indented right, but where normally
it would end the list and become the start of
a paragraph, this time it's in the middle of
a nested list, so even if I end the sublist,
I still have to figure out what to do with
all this text.
This problem really arises as a result of both the new syntax and allowing multiple lines in a list (which I couldn't do before in my old parser). The problem didn't exist in the old parser because every line in a list began with a "list marker" (like *,-, or +), so indentation wasn't important, and multiple lines in a list weren't allowed. I could solve the problem in my current parser, but it would require potentially a lot of lookahead, which I'm not happy about. It's not elegant, and it's slower.
I haven't decided what to do yet, but for now I'm going to see what the reStructuredText people did. I'll bet you they throw parser exceptions (which I don't really want to do).
Feel free to post a comment below. Please see my comment policy.
Formatting Rules (No HTML):