1

I have written a utility which opens a text based file, loads is as a string and performs a find / replace function using RegEx.Replace.

It does this on many files, the user points it at a folder, enters a find string, a replace string and all the files in the folder which contain the string in the file get replaced.

This works great until I try it with a backslash where it falls down.

Quite simply:

newFileContent = Regex.Replace(fileContent, @findString, @replaceString, RegexOptions.IgnoreCase);

fileContent = the contents of a text based file. it will contain carriage returns.

findString = user entered string to find

replaceString = user entered string to replace the found string with

I've tried adding some logic to counter act the backslash as below, but this fails with illegal at end of pattern.

 if (culture.CompareInfo.IndexOf(findString, @"\") >= 0)
     {
      Regex.Replace(findString, @"\", @"\\");
     }

What do I need to do to successfully handle backslashes so they can be part of the find / replace logic?

Entire code block below.

//open reader
                using (var reader = new StreamReader(f,Encoding.Default)) 
                {
                    //read file
                    var fileContent = reader.ReadToEnd();

                    Globals.AppendTextToLine(string.Format(" replacing string"));

                    //culture find replace
                    var culture = new CultureInfo("en-gb", false);
                    //ensure nothing has changed
                    if (culture.CompareInfo.IndexOf(fileContent, findString, CompareOptions.IgnoreCase) >= 0)
                    {

                        //if find or replace string contains backslahes
                        if (culture.CompareInfo.IndexOf(findString, @"\") >= 0)
                        {
                            Regex.Replace(findString, @"\", @"\\");
                        }

                        //perform replace in new string
                        if (MainWindow.Main.chkIgnoreCase.IsChecked != null && (bool) MainWindow.Main.chkIgnoreCase.IsChecked)                        
                            newFileContent = Regex.Replace(fileContent, @findString, @replaceString, RegexOptions.IgnoreCase);
                        else
                            newFileContent = Regex.Replace(fileContent, @findString, @replaceString);

                        result[i].Result = true;
                        Globals.AppendTextToLine(string.Format(" success!"));
                    }
                    else
                    {
                        Globals.AppendTextToLine(string.Format(" failure!!"));
                        break;
                    }
                }
Damo
  • 1,898
  • 7
  • 38
  • 58
  • Some food for thought: What if I put a file in a folder with a size that exceeds the amount of memory you have? Currently your solution will fail. Perhaps you should do the replaces line-by-line instead of on the entire file at once. – Cᴏʀʏ Nov 26 '13 at 23:15
  • Allowing the user to put in a regex string is usually a bad idea as they can easily shoot themselves in the foot. An exception might be a dev tool - on which the user should both know it's a regex beforehand AND be knowledgeable enough to escape their backslashes. A layman doing a regex is not something that will work and you should probably rethink your approach. – McAden Nov 26 '13 at 23:15
  • Appreachate the feedback, however it is a support tool for a technical help desk. – Damo Nov 26 '13 at 23:20

2 Answers2

2

You should be using Regex.Escape when you pass the user-input into the Replace method.

Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $, ., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.

For example:

newFileContent = Regex.Replace(fileContent,
                               Regex.Escape(findString),
                               replaceString,
                               RegexOptions.IgnoreCase);
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Out of interest - if you ARE escaping the special chars what does Regex.Replace 'offer' over String.Replace? – tolanj Nov 26 '13 at 23:32
  • @tolanj: The main benefit that springs immediately to mind is that it allows you to perform case-insensitive searches by passing the `RegexOptions.IgnoreCase` flag. – LukeH Nov 26 '13 at 23:37
  • thanks, I would have sworn there was a built in String.Replace that took a StringComparison at least, but indeed there is not.. Seems a common fallacy http://stackoverflow.com/questions/5549426/is-there-a-case-insensitive-string-replace-in-net-without-using-regex – tolanj Nov 26 '13 at 23:45
  • Indeed it is the ignoring of the case that forces me to use RegEx – Damo Nov 27 '13 at 22:18
1

Your fundamental issue is that your letting your user enter an arbitrary regexp and thus, well, its interpreted as a regexp...

either you goal is just to replace literal strings, in which-case use String.Replace OR you want to allow a user to enter a regexp, in which case just accept that the user will need to \ escape their special characters.

Since \ is a regexp escape char (As well as c# one but you seem to be dealing with that with @) "\" is an illegal regexp because what are you escaping

If you Really want a rexexp to replace all \ with \\ then its:

Regex.Replace(findString, @"\\", @"\\\\"); --ie one \ after escape, two chars after escape.

But you've still got [].?* etc to worry about.

My strong advice is a checkbox, user can select if they are entering a regexp or string literal for replacement and then call String.Replace or Regex.Replace accordingly

tolanj
  • 3,651
  • 16
  • 30
  • A good answer because you cover both findhString as a regex and literal, but if its a literal only metachars need to be escaped (includes `\ ` itself) . +1 –  Nov 27 '13 at 00:09