0

Excel returns a reference of the form

=Sheet1!R14C1R22C71junk

("junk" won't normally be there, but I want to be sure that there's no extraneous text.)

I would like to 'split' this into a VB array, where

a(0)="Sheet1"
a(1)="14"
a(2)="1"
a(3)="22"
a(4)="71"
a(5)="junk"

I'm sure it can be done easily with a regular expression, but I just can't get the hang of it. Is there a kind soul who could help me?

Thanks

smirkingman
  • 6,167
  • 4
  • 34
  • 47
  • 1
    Be careful with the whole regex path unless you know the text you'll be working with is very tightly constrained. All kinds of outlandish Excel expressions can evaluate to references! You might find the comment thread on this question to be of interest: http://stackoverflow.com/questions/1897906/is-this-the-regex-for-matching-any-cell-reference-in-an-excel-formula. – jtolle Oct 14 '10 at 23:30

3 Answers3

2
=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)

should work.

[^!]+ matches a sequence of non-exclamation-point characters.

\d+ matches a sequence of digits.

.* matches anything.

So, in VB.NET:

Dim a As Match
a = Regex.Match(SubjectString, "=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)")
If a.Success Then
    ' matched text: a.Value
    ' backreference n text: a.Groups(n).Value
Else
    ' Match attempt failed
End If
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Thanks for the quick reply. Hm, tricky this regex stuff. But that would also accept "=Sheet1!xx14yyyy1zz22w71" would it not? I want to be sure it's Rs and Cs. Also "junk" could be rubbish of any length, starting with a non-digit, not just a word. Possible? – smirkingman Oct 14 '10 at 19:43
  • Yes, that should work. The dot will match anything, and the previous `\d+` has already used up all the digits. I've also changed the regex to accomodate the literal R and C. – Tim Pietzcker Oct 14 '10 at 19:45
  • A(4) finishes at the first non-digit – smirkingman Oct 14 '10 at 19:46
  • (off-topic) I'm a beginner here. How does get notified as quickly as you do? – smirkingman Oct 14 '10 at 19:46
  • Sorry, I edited my comment as you were answering to it - I had misread your previous comment. You get a notification about new comments when you refresh the page or visit a different page on Stack Overflow (little mailbox icon lights up, and the top left corner "Stack Exchange" symbol shows a number with your new comments. – Tim Pietzcker Oct 14 '10 at 19:48
  • Wow, I'm gob-struck, that's a gem. Thank you very much! – smirkingman Oct 14 '10 at 19:50
  • You're very welcome. Notifications work very well on SO; if you want to trigger a notification to a third person (say, a commenter on your answer to someone else's question), just add an `@` in front of her name (@smirkingman), and she will get notified that you directed a comment to her. But you only need to do that for someone who isn't the author of the question or the answer that you're commenting on. – Tim Pietzcker Oct 14 '10 at 19:55
2

A straightforward String.Split would work, provided the "junk" text wasn't there:

Dim input As String = "=Sheet1!R14C1R22C71"
Dim result = input.Split(New Char() { "="c, "!"c, "R"c, "C"c }, StringSplitOptions.RemoveEmptyEntries)
For Each item As String In result
    Console.WriteLine(item)
Next

The regex gets a little tricky since you will need to go through the Groups and Captures of the nested portions to get the proper order.

EDIT: here's my regex solution. It accepts multiple occurrences of R's and C's.

Dim input As String = "=Sheet1!R14C1R22C71junk"
Dim pattern As String = "=(?<Sheet>Sheet\d+)!(?:R(?<R>\d+)C(?<C>\d+))+"

Dim m As Match = Regex.Match(input, pattern)
If m.Success Then
    Console.WriteLine(m.Groups("Sheet").Value)
    For i = 0 To m.Groups("R").Captures.Count - 1
        Console.WriteLine(m.Groups("R").Captures(i).Value)
        Console.WriteLine(m.Groups("C").Captures(i).Value)
    Next
End If

Pattern explanation:

  • "=(?Sheet\d+)" : matches an = sign followed by "Sheet" and digits. Uses named group of "Sheet"
  • "!(?:R(?\d+)C(?\d+))+" : matches the exclamation mark followed by at least one occurrence of the *R*xx*C*xx portion of the text. Named groups of "R" and "C" are used.
  • "(?:...)+" : this portion from the above portion matches but does not capture the inner pattern (i.e., the R/C part). This is to avoid unnecessarily capturing them while we are actually capturing them with the named groups.
Ahmad Mageed
  • 94,561
  • 19
  • 163
  • 174
0

More general regexes for R1C1 style:

^=(?:(?<Sheet>[^!]+)!)?(?:R((?<RAbs>\d+)|(?<RRel>\[-?\d+\]))C((?<CAbs>\d+)|(?<CRel>\[-?\d+\]))){1,2}$

And A1 style:

^=(?:(?<Sheet>[^!]+)!)?(?:(?<Col1>\$?[a-z]+)(?<Row1>\$?\d+))(?:\:(?<Col2>\$?[a-z]+)(?<Row2>\$?\d+))?$

It doesn't match external references like =[Book1]Sheet1!A1 though.

Sergei B.
  • 138
  • 7