1

I need to search for all occurrences of chained C#-like functions in a text string. For example, I would like to break out each method and its parenthetical arguments of a string such as this:

object.method(1, "2", abc).method2().method3("test(), 1, 2, 3").method4("\"Hi\"")

Here is the regex pattern I almost had working:

(?<objectName>[^\}]*?)\.(?<methodName>[^\}]*?)\(((?:[^;"']|"[^"]*"|'[^']*')+)*?\)

This extracts the objectName and the first methodName correctly, but lumps

1, "2", abc).method2().method3("test, 1, 2, 3").method4("\"Hi\""

all into the third argument as "$1".

My latest approach was to divide and conquer by removing the objectName specification as that is easy to parse out. This lead me to using:

\.(?<methodName>[^(]*?)\(((?:[^;"']|"[^"]*"|'[^']*')+)*?\)

Which yields similar results as before obviously without the objectName. I did this to see if I could get a global result but could get the right regex syntax.

In summary, I need to parse out multiple chained .method(parameters) occurrences into their constituent parts named "methodName" and "parameters". I have deduced a few things but my regex skills are quite rusty at best and am unable to overcome this at this time. I appreciate any help you may have to offer.

I have been using this site for testing: http://regexstorm.net/tester

UPDATE: To clarify, the requirements do not include supporting C# lambda expressions, only the dotted function syntax. This is not intended to be a full C# parser. The only need is the dotted method chaining. I apologize for any confusion. The pattern I was looking to breakout is:

object.method(arguments).method(arguments).method(arguments)...

My approach to this was to first extract the object name which is a simple operation that does not require the use of Regex. This would now leave the following for Regex parsing into two constituent parts:

.method(arguments).method(arguments).method(arguments)...

Which would yield:

method   arguments
method   arguments
method   arguments
...

arguments may be null (missing), as in .method(), or method may actually be a property (no parentheses and arguments), as in:

.method.method().method(arguments)

Which would yield:

method   (null)
method   (string.Empty)
method   arguments

arguments would contain everything between the opening and closing parentheses; these do not need to be parsed out at this time as those would be processed in a subsequent Regex operation.

This seems to me to be within the capability of Regex to detect this simple pattern of dot-method-openPar-argumentsStr-closePar next dot-method-openPar-argumentsStr-closePar and so forth.

This is the extent of the grammar - no comments, no lambda - just object.method(arguments).method()...

I hope this helps.

jmullis
  • 35
  • 4
  • Every () creates a group, in the order of the "(". Have you tried using higher $'s like "$2" or "$3" to see if they contain they desired output? – Manuel Hoffmann Sep 19 '16 at 06:57
  • Your hope will crumble when I introduce you this chain: 'method1(").ToString()").method2(1).method3(").Fail()")' – eocron Sep 19 '16 at 06:59
  • I answered a very similar [question](http://stackoverflow.com/a/37314847/2729609) some time ago. The regex is going to be very large. – Sebastian Schumann Sep 19 '16 at 07:05

1 Answers1

1

This can't be properly done through regex, because your arguments is just too unpredictable, and regex grammar level is uncomparable with C# parser grammar. For example, it can contain string with any content:

method1("x.hiThere().lol()").method2()

it can nest:

method1(x=>method2().method3())

it can just do this:

a("b().c()",d=> d(").hi()"))

For you problem solution you need to learn about Grammars, and write C# grammar for this particular task. In terms of frameworks you can start from ANTLR project.

Explanation

The reason because you can't do this is grammar type differences. Regex is using regular language and is Type-3 in Chomsky hierarchy. C# is using context-free language and is Type-2 in Chomsky hierarchy.

If you represent it visualy, C# is much more powerful language than Regex language:

enter image description here

For example, your case fall into pit of parsers is just because of lambda's in C#:

method1(x=>
{
    ....
    /* some code here */
    ....
}).method2()
eocron
  • 6,885
  • 1
  • 21
  • 50
  • I'm with you that this should not be solved using regex but the posted problems are able to [handle](http://stackoverflow.com/a/37314847/2729609). – Sebastian Schumann Sep 19 '16 at 07:10
  • The case I created this answer is fundamental. Regex grammar is Type-3 grammar (Regular), on the other hand C# grammar - Type-2 grammar (Context-free). It means there always be cases when Regex grammar can't represent C# grammar. – eocron Sep 19 '16 at 07:23
  • Thanks @ecron06. Excellent info on parsing science. I have updated the original post to limit the scope of problem. The C# reference was simply used for illustrative purposes to frame the problem. I do appreciate your answer. – jmullis Sep 20 '16 at 07:30
  • Does anyone else have additional thoughts/ideas given the clarified limited scope? – jmullis Sep 21 '16 at 05:57