Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Objective :
Add an additional
WHERE
clause to any given Clickhouse statement.
I'm using the following Antlr grammars to generate Java classes for a lexer & parser.
Lexer grammar
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4
Parser grammar
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4
Problem :
I cannot figure out/understand how to interact or create the appropriate POJOs for use with the generated classes that Antlr produces.
Example of statement
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
Goal of SQL (enrichment code)
String enrichedQuery = SqlParser.enrich(query);
System.out.println(enrichedQuery);
//Output
>> INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') (WHERE X IN USERS)
I have the follow Java main
public class Hello {
public static void main( String[] args) throws Exception{
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
ClickhouseLexer = new ClickhouseLexer(new ANTLRInputStream(query));
CommonTokenStream tokens = new CommonTokenStream(lexer);
ClickHouseParser = new ClickHouseParser (tokens);
ParseTreeWalker walker = new ParseTreeWalker();
–
–
–
–
–
I'd suggest taking a look at TokenStreamRewriter.
First, let's get the grammars ready.
1 - with TokenStreamRewriter
we'll want to preserve whitespace, so let's change the -> skip
directives to ->channel(HIDDEN)
At the end of the Lexer grammar:
// Comments and whitespace
MULTI_LINE_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SINGLE_LINE_COMMENT: '--' ~('\n'|'\r')* ('\n' | '\r' | EOF) -> channel(HIDDEN);
WHITESPACE: [ \u000B\u000C\t\r\n] -> channel(HIDDEN); // '\n' can be part of multiline single query
2 - The C++ specific stuff just guards against using keywords more than once. You don't really need that check for your purposes (and it could be done in a post-parse Listener if you DID need it). So let's just lose the language specific stuff:
engineClause: engineExpr (
orderByClause
| partitionByClause
| primaryKeyClause
| sampleByClause
| ttlClause
| settingsClause
dictionaryAttrDfnt
: identifier columnTypeExpr (
DEFAULT literal
| EXPRESSION columnExpr
| HIERARCHICAL
| INJECTIVE
| IS_OBJECT_ID
dictionaryEngineClause
: dictionaryPrimaryKeyClause? (
sourceClause
| lifetimeClause
| layoutClause
| rangeClause
| dictionarySettingsClause
NOTE: There seems to be an issue with the grammar not accepting the actual values for an insert statement:
insertStmt
: INSERT INTO TABLE? (
tableIdentifier
| FUNCTION tableFunctionExpr
) columnsClause? dataClause
columnsClause
: LPAREN nestedIdentifier (COMMA nestedIdentifier)* RPAREN
dataClause
: FORMAT identifier # DataClauseFormat
| VALUES # DataClauseValues // <- problem on this line
| selectUnionStmt SEMICOLON? EOF # DataClauseSelect
(I'm not going to try to fix that part, so I've commented your input to accommodate)
(It would also help if the top level rule needed with an EOF
token; without that ANTLR just stops parsing after VALUE
. An EOF at the end of a root rule is considered a best practice for exactly this reason.)
The Main program:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStreamRewriter;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class TSWDemo {
public static void main(String... args) {
new TSWDemo().run(CharStreams.fromString("INSERT INTO t VALUES /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */"));
public void run(CharStream charStream) {
var lexer = new ClickHouseLexer(charStream);
var tokenStream = new CommonTokenStream(lexer);
var parser = new ClickHouseParser(tokenStream);
var tsw = new TokenStreamRewriter(tokenStream);
var listener = new TSWDemoListener(tsw);
var queryStmt = parser.queryStmt();
ParseTreeWalker.DEFAULT.walk(listener, queryStmt);
System.out.println(tsw.getText());
The Listener:
import org.antlr.v4.runtime.TokenStreamRewriter;
public class TSWDemoListener extends ClickHouseParserBaseListener {
private TokenStreamRewriter tsw;
public TSWDemoListener(TokenStreamRewriter tsw) {
this.tsw = tsw;
@Override
public void exitInsertStmt(ClickHouseParser.InsertStmtContext ctx) {
tsw.insertAfter(ctx.getStop(), " (WHERE X IN USERS)");
Output:
INSERT INTO t VALUES (WHERE X IN USERS) /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.