Skip to content

Define Grammar AST types explicitly, refine EBNF-based terminals to avoid synthetic capturing groups #1966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sailingKieler
Copy link
Contributor

This PR introduces langium-types.langium defining all the AST types of Langium's grammar language.
As an initial customization it contributes interface TerminalElement extends AbstractElement, and AbstractElement#lookahead is moved to TerminalElement.

The second commit refines the synthesis of RegExps representing the terminals s.t. synthetic pairs of parentheses that are added to the RegExp are marked as non-capturing groups (?:...), while parentheses being present in the terminal definition are transferred as capturing groups (...). To enable that an additional flag TerminalElement#parenthesized is introduced for capturing the information wether a sub terminal is enclosed in parenthesizes.
This gives adopters more control over the capturing groups within the RegExp, which is relevant when re-using the generated RegExps e.g. for value conversion, like

const match = LangiumGrammarTerminals.RegexLiteral.exec(input);
const result = match && convert(match[1], match[2], match[3], match[4]);

…icated AST type `TerminalElement`

* updated grammar
… to RegExp now avoiding unnecessary capturing groups '(...)'

* added 'paranthesized' flag to type TerminalElement
* marked required synthetic groups as non-capturing '(?:...)'
* updated tests
* updated example languages
@sailingKieler sailingKieler added this to the v4.0.0 milestone Jun 24, 2025
@sailingKieler sailingKieler requested a review from msujew June 24, 2025 13:25
@sailingKieler sailingKieler added the grammar Grammar language related issue label Jun 24, 2025
@sailingKieler sailingKieler changed the title Define AST types explicitly, refines EBNF-based terminals to avoid synthetic capturing groups Define Grammar AST types explicitly, refines EBNF-based terminals to avoid synthetic capturing groups Jun 24, 2025
@sailingKieler sailingKieler changed the title Define Grammar AST types explicitly, refines EBNF-based terminals to avoid synthetic capturing groups Define Grammar AST types explicitly, refine EBNF-based terminals to avoid synthetic capturing groups Jun 24, 2025
Copy link
Contributor

@spoenemann spoenemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! A few details below.

@@ -0,0 +1,242 @@
type AbstractRule = InfixRule | ParserRule | TerminalRule;
Copy link
Contributor

@spoenemann spoenemann Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the copyright header with Copyright 2025


interface AbstractElement {
cardinality?: "*" | "+" | "?";
// parenthesized: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it instead of leaving it commented out.

return reflection.isInstance(item, TerminalElement.$type);
}

export interface CharacterRange extends TerminalElement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're not sorting the types properly prior to generating them – the order shouldn't change after switching to declared types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grammar Grammar language related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants