I have quite large amount of text which include control charachters like \n \t and \r. I need to replace them with a simple space--> " ". What is the fastest way to do this? Thanks
Asked
Active
Viewed 3.1k times
11
-
Obviously, as Zen of Python suggests, there is only way to do that ;-) – gruszczy Feb 10 '11 at 09:51
-
when the string has multiple adjacent such characters e.g.`foo\r\nbar`, do you want to replace `\r\n` by two spaces or only 1? – John Machin Feb 10 '11 at 10:53
-
i want to replace it with only 1 – Hossein Feb 10 '11 at 11:30
-
Consider also stripping leading and trailing whitespace. Then please edit your question so that it specifies exactly what you want. – John Machin Feb 10 '11 at 11:57
-
If you want to strip leading and trailing whitespace as well, have a look at [this answer](http://stackoverflow.com/questions/1898656/remove-whitespace-in-python-using-string-whitespace/1898835#1898835). – Sven Marnach Feb 10 '11 at 17:20
6 Answers
27
I think the fastest way is to use str.translate()
:
import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", " "))
prints
a b c d
EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate()
is way faster than using regular expressions:
s = "a\nb\rc\td " * 1250000
regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop
table = string.maketrans("\n\t\r", " ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop
That's about a factor 40.

Sven Marnach
- 574,206
- 118
- 941
- 841
-
5It is important to note that string.translate and string.makestrans is not available in Python3. re based solution seems better. – Senthil Kumaran Feb 10 '11 at 10:05
-
@Ignacio: import string;hasattr(string,'translate');hasattr(string,'maketrans') It will be False, if you do hasattr(str,'translate') and hasattr(str,'maketrans') it is True. module string is just a collection of string constants. Moreover, as per definition and proper way to use maketrans would be bytes.maketrans. Thanks! – Senthil Kumaran Feb 10 '11 at 10:21
10
You may also try regular expressions:
import re
regex = re.compile(r'[\n\r\t]')
regex.sub(' ', my_str)

Michal Chruszcz
- 2,452
- 16
- 20
-
I've compared the actual performance and it looks like using regular expressions is as fast as using the string module. – Michal Chruszcz Feb 10 '11 at 10:14
-
`python2.6 timeit.py -s "import string" -s "s = 'a\nb\rc\td'" -s "s.translate(string.maketrans('\n\t\r', ' '))"` 10000000 loops, best of 3: 0.0235 usec per loop – Michal Chruszcz Feb 10 '11 at 10:15
-
`python2.6 timeit.py -s "import re" -s "regex = re.compile(r'[\n\r\t]')" -s "regex.sub(' ', 'a\nb\rc\td')"` 10000000 loops, best of 3: 0.0232 usec per loop – Michal Chruszcz Feb 10 '11 at 10:15
-
1@Michal - are you comparing `regex.sub(...)` to `s.translate(string.maketrans(...))` or to `s.translate(preparedTrans)` only? – eumiro Feb 10 '11 at 10:20
-
@eumiro, the former, my bad - I focused on the above solution. The latter is comparable, though. – Michal Chruszcz Feb 10 '11 at 10:27
-
`python2.6 timeit.py -s "import string" -s "s = 'a\nb\rc\td'" -s "trans = string.maketrans('\n\t\r', ' ')" -s "s.translate(trans)"` 10000000 loops, best of 3: 0.0256 usec per loop – Michal Chruszcz Feb 10 '11 at 10:27
-
1@Michal: It's completely meaningless to try this on a string with 7 characters. See the edit in my answer. – Sven Marnach Feb 10 '11 at 10:39
5
>>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4')
'1 2 3 4'

Ignacio Vazquez-Abrams
- 776,304
- 153
- 1,341
- 1,358
4
If you want to normalise whitespace (replace runs of one or more whitespace characters by a single space, and strip leading and trailing whitespace) this can be accomplished by using string methods:
>>> text = ' foo\tbar\r\nFred Nurke\t Joe Smith\n\n'
>>> ' '.join(text.split())
'foo bar Fred Nurke Joe Smith'

John Machin
- 81,303
- 11
- 141
- 189
2
using regex
re.sub(r'\s+', ' ', '1\n2\r3\t4')
without regex
>>> ' '.join('1\n\n2\r3\t4'.split())
'1 2 3 4'
>>>

kurumi
- 25,121
- 5
- 44
- 52