相关文章推荐
大鼻子的菠萝  ·  ASP.NET 4.7.2 VB MVC ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Objective : Add an additional WHERE clause to any given Clickhouse statement.

I'm using the following Antlr grammars to generate Java classes for a lexer & parser.

Lexer grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4

Parser grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4

Problem : I cannot figure out/understand how to interact or create the appropriate POJOs for use with the generated classes that Antlr produces.

Example of statement

String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"

Goal of SQL (enrichment code)

String enrichedQuery = SqlParser.enrich(query);
System.out.println(enrichedQuery);
//Output
>> INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') (WHERE X IN USERS)

I have the follow Java main

public class Hello {
    public static void main( String[] args) throws Exception{
        String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
        ClickhouseLexer = new ClickhouseLexer(new ANTLRInputStream(query));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ClickHouseParser = new ClickHouseParser (tokens);          
        ParseTreeWalker walker = new ParseTreeWalker();
                First, write a driver that just parses the input string. Replace "HelloParser" with "ClickHouseParser", "HelloLexer" with "ClickHouseLexer" in the above main(). Test that, then you can worry about modifying the parser tree for your goal.
– kaby76
                Aug 6, 2021 at 20:58
                @kaby76 Appreciate the message. Please see the revised code. I erroneously hand-jammed a quick synopsis of what I was trying to achieve. That has since been updated prior to your message
– stackoverflow
                Aug 6, 2021 at 21:01
                Try var str = CharStreams.fromString(input); var lexer = new ClickHouseLexer(str); var tokens = new CommonTokenStream(lexer); var parser = new ClickHouseParser(tokens); var tree = parser.queryStmt();. But the parser grammar is targeted for C, not Java. So, you have to change the parser grammar: add @header { import java.util.Set; } after options {...} at the top; std::set<std::string> attrs changed to Set<String> attrs; attrs.count( changed to attrs.contains(; attrs.insert( changed to attrs.add(. Similar for clauses.
– kaby76
                Aug 6, 2021 at 23:15
                There are several ways you could implement the where clauses, but an easy solution is to just write a tree walker (likely you can use the generated Antlr listener for the grammar) that outputs at each leaf the original code, but when you get to a particular node that the where occurs, output that.
– kaby76
                Aug 7, 2021 at 1:58
                Because of subqueries it's not clear which where clause you wanna add. Is it for the top level select only? And is this only for select statements or the other statements too that can have a where clause?
– Mike Lischke
                Aug 7, 2021 at 16:39

I'd suggest taking a look at TokenStreamRewriter.

First, let's get the grammars ready.

1 - with TokenStreamRewriter we'll want to preserve whitespace, so let's change the -> skip directives to ->channel(HIDDEN)

At the end of the Lexer grammar:

// Comments and whitespace
MULTI_LINE_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SINGLE_LINE_COMMENT: '--' ~('\n'|'\r')* ('\n' | '\r' | EOF) -> channel(HIDDEN);
WHITESPACE: [ \u000B\u000C\t\r\n] -> channel(HIDDEN);  // '\n' can be part of multiline single query

2 - The C++ specific stuff just guards against using keywords more than once. You don't really need that check for your purposes (and it could be done in a post-parse Listener if you DID need it). So let's just lose the language specific stuff:

engineClause: engineExpr (
    orderByClause
    | partitionByClause
    | primaryKeyClause
    | sampleByClause
    | ttlClause
    | settingsClause
dictionaryAttrDfnt
    : identifier columnTypeExpr (
        DEFAULT literal
        | EXPRESSION columnExpr
        | HIERARCHICAL
        | INJECTIVE
        | IS_OBJECT_ID
dictionaryEngineClause
    : dictionaryPrimaryKeyClause? (
        sourceClause
        | lifetimeClause
        | layoutClause
        | rangeClause
        | dictionarySettingsClause

NOTE: There seems to be an issue with the grammar not accepting the actual values for an insert statement:

insertStmt
    : INSERT INTO TABLE? (
        tableIdentifier
        | FUNCTION tableFunctionExpr
    ) columnsClause? dataClause
columnsClause
    : LPAREN nestedIdentifier (COMMA nestedIdentifier)* RPAREN
dataClause
    : FORMAT identifier              # DataClauseFormat
    | VALUES                         # DataClauseValues // <- problem on this line
    | selectUnionStmt SEMICOLON? EOF # DataClauseSelect

(I'm not going to try to fix that part, so I've commented your input to accommodate)

(It would also help if the top level rule needed with an EOF token; without that ANTLR just stops parsing after VALUE. An EOF at the end of a root rule is considered a best practice for exactly this reason.)

The Main program:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStreamRewriter;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class TSWDemo {
    public static void main(String... args) {
        new TSWDemo().run(CharStreams.fromString("INSERT INTO t VALUES /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */"));
    public void run(CharStream charStream) {
        var lexer = new ClickHouseLexer(charStream);
        var tokenStream = new CommonTokenStream(lexer);
        var parser = new ClickHouseParser(tokenStream);
        var tsw = new TokenStreamRewriter(tokenStream);
        var listener = new TSWDemoListener(tsw);
        var queryStmt = parser.queryStmt();
        ParseTreeWalker.DEFAULT.walk(listener, queryStmt);
        System.out.println(tsw.getText());

The Listener:

import org.antlr.v4.runtime.TokenStreamRewriter;
public class TSWDemoListener extends ClickHouseParserBaseListener {
    private TokenStreamRewriter tsw;
    public TSWDemoListener(TokenStreamRewriter tsw) {
        this.tsw = tsw;
    @Override
    public void exitInsertStmt(ClickHouseParser.InsertStmtContext ctx) {
        tsw.insertAfter(ctx.getStop(), " (WHERE X IN USERS)");

Output:

INSERT INTO t VALUES (WHERE X IN USERS) /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.