1

Am trying to port the antlr java project to C++. In java I was able to get the original text by getting help from How do I get the original text that an antlr4 rule matched? It was awesome!!

My attempt in C++,

    CharStream *input = ctx->start->getInputStream();
    int a = ctx->start->getStartIndex();
    int b = ctx->start->getStopIndex();
    IntervalSet interval = IntervalSet(a,b);
    string text = input->getText(interval.getIntervals()[2]);

This "getIntervals()" is helping me just like it worked in Java.

Not working properly in C++:

    CharStream *input = ctx->start->getInputStream();
    int a = ctx->start->getStartIndex();
    int b = ctx->start->getStopIndex();
    IntervalSet interval = IntervalSet(a,b);
    string text = input->getText(interval.getIntervals()[2]);

When tried like this, Iam not getting spaces,

string text = ctx->getText();

getting text without spaces.

intmain(){cout<<"Hello, World!";strncpy(pStr,pStart,len);for(i=0;i<10;i++){j=i*i;i=j/5;}return0;}
inbarajan
  • 31
  • 4

3 Answers3

1

Everything worked as expected :) Thanks to Mike: https://stackoverflow.com/users/1137174/mike-lischke

Changes:

Used misc/Interval.h, ctx->getStart()->getStartIndex(); ctx->getStop()->getStopIndex();

enterExpressionstatement:

    Interval intvl = Interval();
    intvl.a = ctx->getStart()->getStartIndex();
    intvl.b = ctx->getStop()->getStopIndex();
    string text2 = input->getText(intvl);
    cout <<"enterExpressionstatement "<<text2<<endl;
    .....
    .....
    cout <<"enterIterationstatement "<<text2<<endl;

Output:

$ ./parser cpp_forloop
enterExpressionstatement cout << "Hello, World!";
enterExpressionstatement strncpy(pStr, pStart, len);
enterIterationstatement for(i = 0; i < 10; i++ ) { 
 j = i*i; i = j/5; 
 }
enterExpressionstatement i = 0;
enterExpressionstatement j = i*i;
enterExpressionstatement i = j/5;
End of program cpp_forloop
inbarajan
  • 31
  • 4
0

The behavior of RuleContext::getText is the same in all targets: it retrieves the text of the given context by adding the text of each subcontext to a single string (recursively). If your grammar skips whitespaces or puts them on a different channel then these characters are not included in the result, because there is no (visible) match for them and hence they don't appear in the parse tree.

But the getText() function on the input stream (taking an interval) is what gives you back the full original text (including all line breaks, comments etc.). Your given code however is confusing:

  1. You included the same code twice. One time you write this works and the other time it does not.
  2. You are accessing an interval in your set which doesn't exist (there's only one interval at index 0). Why creating an interval set in the first place? Just pass in Interval(a, b) instead.
Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • Thanks Mike..I set the index to 0 as you suggested. Here the code to parse For loop..Still I can't figure out this issue. You've mentioned Interval(a,b) but antlr4 support only IntervalSet. IntervalSet interval = IntervalSet(a,b); string text = input->getText(interval.getIntervals()[0]); cout <<"a " << a <<" b "<< b << text; Result: Antlrusr$ ./parser cpp_forloop a 68 b 70rEnd of program cpp_forloop – inbarajan Apr 25 '19 at 11:51
  • Thanks a lot Mike. Finally it worked :) I'll post the solution in the Answer your Question. – inbarajan Apr 25 '19 at 15:00
0

If anybody reaches here for python3 and antlr4 like I did, for getting the original text, here is the way that worked for me.

  • In the grammar (.g4), ensure that whitespaces and anythin else thats needed i.e. hints, comments etc are being written to the hidden channel

        WS  : (' '|'\r'|'\t'|'\n') -> channel(HIDDEN)
        ;
    
  • In the Listener, for any rule that the text is needed, do the following:

        def enterCreateTableStatement(self, ctx: HiveParser.CreateTableStatementContext):
          original_text = ctx.parser.getInputStream().getText(ctx.start, ctx.stop)
          print(original_text)
    
vhora
  • 320
  • 2
  • 16