Parsing
Parsing steps
The goal of the Lawtext parser is to extract the law content that the original Lawtext expresses and obtain it as StdEL
. The parser processes the Lawtext in three steps to achieve that goal:
- Parse the Lawtext into a CST (Concrete Syntax Tree). The CST is a sequence of
Line
s. - Analyze the indentation of
Line
s and insert indent (one increase of indent level) / dedent (one decrease of indent level) markers between the lines. The output of this step is a sequence ofVirtualLine
s. - Parse the
VirtualLine
s into aStdEL
tree.
You can find the entry point of the Lawtext parser at (core/src/parser/)lawtext.parse()
.
Syntax trees: Line
s and VirtualLine
s
The Lawtext CST (Concrete Syntax Tree) is a sequence of Lines (library documentation here). A Line
represents a physical line in the Lawtext string and contains all detailed features in the original string (not included in the resulting StdEL
tree), such as verbose blank lines, meaningless whitespaces, and physical position (line and character index).
Instead of directly parsing the Lawtext into StdEL
, we adopted the intermediate CST representation (Line
s) for the following reasons.
- We can simplify the following process of parsing
Line
s into aStdEL
tree because the Lawtext syntax has a line-based structure. - We can use the same CST to render the law into Lawtext and format Lawtext.
- We need detailed position information to provide a language server in source code editors such as Visual Studio Code.
During parsing the CST into a StdEL
tree, the parser first analyzes the structure of indented blocks (indent levels) and inserts indent (one increase of indent level) / dedent (one decrease of indent level) markers between the lines. The resulting structure is a sequence of VirtualLines (library documentation here), and the main part of parsing is conducted on the sequence of VirtualLine
s.
We made the indentation analysis separate from building CST because the virtual (logical) indent levels in Lawtext do not always match the physical (visual) indent levels. For example, the article caption line in the Lawtext sample above (␣␣(目的等)
) has one full-width (two ASCII-width) whitespace indentation. This whitespace complies with the layout rule of traditional Japanese law texts for visual compatibility. However, the article caption line does not make a new indentation level, and we have to consider the context of the lines to understand the line is an article caption, not a part of indented lines. Therefore, the Lawtext parser first parses the physical indent levels (and output Line
s) and then detects the virtual indent levels (and output VirtualLine
s).
Parsing algorithm
For steps 1 and 3, the main process of parsing is recursive descent parsing of a string (a sequence of characters) and VirtualLine
s, respectively. We utilize a generic recursive descent parser generic-parser
(opens in a new tab) for both steps.
How to try
Try it out here
- On this page, open the browser console (for Chrome, press
Ctrl+Shift+J
(Windows/Linux) orCmd+Opt+J
(Mac)). - Run the following command in the browser console:
lawtext.run({ input: { elaws: "405AC0000000088" }, outtypes: ["lawtext"], }) .then(r => { console.log("Lawtext:"); console.log(r.lawtext); return lawtext.run({ input: { lawtext: r.lawtext }, outtypes: ["json"], controlel: true, }); }) .then(r => { console.log("\u{2705} Parsed VirtualLines:"); console.log(r.virtuallines); console.table(r.virtuallines.map(v => ({ "VirtualLine.type": v.type, ...Object.fromEntries(v.line?.rangeTexts().map(([, t, d], i) => [`rangeTexts()[${i}]`, t]) ?? []), }))); console.log("\u{2705} Parsed StdEL (JsonEL):"); console.log(r.json); console.log(JSON.stringify(r.json, undefined, 2)); });
Hint: run console.log(lawtext.run.help)
to show the help.
Try using the CLI
Please see CLI usage and run CLI with a lawtext
input and json
output with controlel
option (and virtuallinesout
option when running in Node.js) to get the parsed VirtualLine
s and the parsed StdEL
(JsonEL
) representation.
Try using the Visual Studio Code extension
-
You can try the extension at github.dev (opens in a new tab) with a few clicks. (A GitHub account is required. You can create one on the linked page.)
-
Otherwise, you can visit vscode.dev (opens in a new tab), install Lawtext extension (opens in a new tab) and ope n the sample Lawtext (opens in a new tab).