0

I have a text file containing the following 5 lines of code:

#
# 1991    60060GBNYPAN
# 1991    60061GBGTSJT
# 1991    60062GBGTSJT
#

If I open the default R GUI on a Windows machine and paste in those five lines I obtain the following:

> #
> # 
> 1991    60060GBNYPAN
Error: unexpected numeric constant in "1991    60060"
> # 1991    60061GBGTSJT
> # 1991    60062GBGTSJT
> #
> 

When I paste the same five lines of code instead into the Stack Overflow question window I obtain:

#
# 
1991    60060GBNYPAN
# 1991    60061GBGTSJT
# 1991    60062GBGTSJT
#

If I open the text file containing that code into gVim 7.4 I see:

#^M
# 
1991    60060GBNYPAN^M
# 1991    60061GBGTSJT^M
# 1991    60062GBGTSJT^M
#^M

All characters are blue in gVim 7.4 except that in the third row the (first) 1991 is in pink font and 60060GBNYPAN is in black font.

I can remove the ^M by typing:

:%s/<ctrl>Q<ctrl>M//g<return>

from here: Read csv file with hidden or invisible character ^M

However, if I then save the file by clicking on: File - Save in gVim 7.4 and then opening the file the contents now look as follows:

## 1991    60060GBNYPAN# 1991    60061GBGTSJT# 1991    60062GBGTSJT#

If I paste those contents into R I get:

> #
> # 
> 1991    60060GBNYPAN
Error: unexpected numeric constant in "1991    60060"
> # 1991    60061GBGTSJT
> # 1991    60062GBGTSJT
> #
> 

the same as before I opened the file in gVim 7.4.

If I open the file into gVim 7.4 a second time (after have removed the ^M and saved the file) I see:

#
# 
1991    60060GBNYPAN
# 1991    60061GBGTSJT
# 1991    60062GBGTSJT
#

The color of the fonts has not changed and at the bottom of the gVim 7.4 window is a message that reads:

<comment character does not work2.r" [unix] 6L, 74C        6,1        All

The number of spaces before and after the 6,1 is just an estimate.

What is going on?

I guess in addition to the ^M there is another hidden character in the file that, unlike the ^M, is not displayed by default when I open the file in gVim 7.4.

Thank you for any suggestions. I might have to load the original file onto GitHub. I will attempt to do that after posting this message here.

EDIT

Although I have a GitHub account and have uploaded files to it, it has been so long that I cannot remember how to upload this latest file. Hopefully I will get the file uploaded shortly.

If I type:

:set list<return> in gVim 7.4 after removing the ^M I see:

$
#$
# $
1991    60060GBNYPAN$
# 1991    60061GBGTSJT$
# 1991    60062GBGTSJT$
#$

I thought that :set list was supposed to reveal all hidden characters, but all it is showing is a blank space between # and $ in the second row (or third row if I count the first $ as a row).

Community
  • 1
  • 1
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • There are several types of newline characters, and one of them must be between the "#" and the "1991" on the first "line". – Joshua Ulrich Sep 02 '14 at 16:10
  • @JoshuaUlrich Thank you. I suspected as much. Hopefully I can figure out how to see them in gVim 7.4 and remove them. – Mark Miller Sep 02 '14 at 16:16
  • Replace them with `ctrl-N` if you want them handled in R. And note that there is a comment.char for read.table that may be needed. – IRTFM Sep 02 '14 at 16:18
  • @BondedDust Thank you. I tried selected all text in the file and then pressing N but that did not eliminate the error in R. – Mark Miller Sep 02 '14 at 16:24

1 Answers1

2

To find the troublesome character, it's best to view a hexdump of the file. On Unix, there are many such tools: hexdump / hd / od, etc.

Since you're on Windows, you can use the xxd command-line tool that ships with Vim. :help 23.4 (Binary files topic of the Vim user manual) tells you how to open the file, and (under using XXD) how to view the file as a hex dump.

Ingo Karkat
  • 167,457
  • 16
  • 250
  • 324