2

In my compilers class, I decided to write my compiler in Python since I enjoy programming in Python, though I encountering an interesting issue with how characters are printed. The lexer I'm writing requires that strings containing the formfeed and backspace characters be printed to stdout in a very particular way: enclosed in double quotes, and printed as \f and \b, respectively. The closest I've gotten:

print("{0!r}".format("\b\f"))

which yields

'\x08\x0c'

Note the single quotes, and utf8 coding. The same command with two other characters I'm concerned with almost works:

print("{0!r}".format("\n\t"))

gives:

'\n\t'

To be clear, the result (including quotes) that I need to conform to the spec is

"\b\f"

Simple approaches like finding \b and \f and replacing them with "\b" and "\f" don't seem to work...the "\" is just the way Python prints a backslash, so I can never seem to get just "\b\f" as one might expect.

Playing with various string encodings doesn't seem to help. I've concluded that I need to write a custom string.Formatter, but I was wondering if there is another approach that I'd missed.

EDIT: Thanks for all the answers. I don't think I did that good of a job asking the question though. The underlying issue is that I'm formatting the strings as raw because I want literal newlines to appear as "\n" and literal tabs to appear as "\t". However, when I move to print the string using raw formatting, I lose the ability to print out "\b" and "\f" as all the answers below suggest.

I'll confirm this tonight, but based on these answers, I think the approach I should be using is to format the output normally, and trap all the literal "\n", "\t", "\b", and "\f" characters with escape sequences that will print them as needed. I'm still hoping to avoid using string.Formatter.

EDIT2: The final approach I'm going to use is to use non-raw string formatting. The non-abstracted version looks something like:

print('"{0!s}"'.format(a.replace("\b", "\\b").replace("\t", "\\t").replace("\f", "\\f").replace("\n","\\n")))
R. P. Dillon
  • 2,780
  • 1
  • 18
  • 20
  • You can always define a "verbatim" string using `r` just before the string as follows: `print r'\b\f'`. – Zenon May 09 '12 at 16:13
  • It's a classroom language called "Cool", designed by the professor Alex Aiken. The whole problem is a bit contrived, since the different parts of the compiler communicate over unix pipes, so the exact formatting of the output matters a whole lot. Obviously, that is an artifact of the classroom setting. – R. P. Dillon May 09 '12 at 17:24

3 Answers3

5

Use raw string:

>>> print(r'\b')
    \b
bluish
  • 26,356
  • 27
  • 122
  • 180
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    Right. But given a string that contains "\b\f", how do I print it a raw string? I've found that print(r'"\b\f"') gives the right result, but given a string that is not raw, how would I do it? – R. P. Dillon May 09 '12 at 16:14
  • @R.P.Dillon: You should control this at the time the string is created. – Steven Rumbalski May 09 '12 at 16:24
  • @R.P.Dillon check this out http://code.activestate.com/recipes/65211-convert-a-string-into-a-raw-string/ – Ashwini Chaudhary May 09 '12 at 16:34
  • @StevenRumbalski, it seems to me that the goal is to change the way the string prints out, not the contents of the string. I assume it needs to be the literal characters for other reasons. – Mark Ransom May 09 '12 at 16:36
  • @MarkRansom That's right, though based on these answers, I think I'm going to have to change the contents of the string, even if only for printing. – R. P. Dillon May 09 '12 at 17:30
  • String *literals* can be raw; there's no difference between a string created with a raw literal and a string created with a non-raw literal. `r'\b' == '\\b'`. – Russell Borogove May 09 '12 at 17:36
  • I understand the types are the same, but there don't seem to be good semantics for talking about the string "\b" would have been if it had been declared r"\b". In the general case, I suppose I could say "escaped", but that's perhaps more general than is useful in this discussion. – R. P. Dillon May 09 '12 at 18:50
3
print("{0!r}".format("\b\f".replace("\b", "\\b").replace("\f", "\\f")))

Or, more cleanly:

def escape_bs_and_ff(s):
    return s.replace("\b", "\\b").replace("\f", "\\f")

print("{0!r}".format(escape_bs_and_ff("\b\f"))
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • When I execute the first like you mentioned, I still get the same result: '\x08\x0c'. But this answer looks the best because it can accept an existing, non-raw string, which is what I have to deal with. – R. P. Dillon May 09 '12 at 16:28
  • @R.P.Dillon, sorry I had the parentheses in the wrong place. I'm on Python 2.7 so I don't have the same syntax for `print`. – Mark Ransom May 09 '12 at 16:30
0
>>> print(r'"\b\f"')
"\b\f"

The r indicates a raw or verbatim string, which means that instead of trying to parse things like \n into a newline, it literally makes the string \n.

Tim S.
  • 55,448
  • 7
  • 96
  • 122