Error with assigning column names to data frame inR

Question

I am running the following code in order to open up a set of CSV files that have temperature vs. time data

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) 
{
  assign(temp[i], read.csv(temp[i], header=FALSE, skip =20))
  colnames(as.data.frame(temp[i])) <- c("Date","Unit","Temp")
}

the data in the data frames looks like this:

                   V1 V2   V3
1 6/30/13 10:00:01 AM  C 32.5
2 6/30/13 10:20:01 AM  C 32.5
3 6/30/13 10:40:01 AM  C 33.5
4 6/30/13 11:00:01 AM  C 34.5
5 6/30/13 11:20:01 AM  C 37.0
6 6/30/13 11:40:01 AM  C 35.5

I am just trying to assign column names but am getting the following error message:

Error in `colnames<-`(`*tmp*`, value = c("Date", "Unit", "Temp")) : 
  'names' attribute [3] must be the same length as the vector [1]

I think it may have something to do how my loop is reading the csv files. They are all stored in the same directory in R.

Thanks for your help!

When you call `temp[i]` in the `colnames` command, it looks like `temp[i]` is referring to a character object and not the variable you created! If I recall correctly, you would then have to use `get(temp[i])` to call a variable with the name that `temp[i]` is referring to. — ialm, Jul 09 '13 at 21:13

Bryan Hanson · Accepted Answer · 2013-07-10T20:22:31.970

1

I'd take a slightly different approach which might be more understandable:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) 
{
  tmp <- read.csv(temp[i], header=FALSE, skip =20)
  colnames(tmp) <- c("Date","Unit","Temp")
  # Now what do you want to do?
  # For instance, use the file name as the name of a list element containing the data?
}

Update:

temp = list.files(pattern="*.csv")
stations <- vector("list", length(temp))
for (i in 1:length(temp)) {
  tmp <- read.csv(temp[i], header=FALSE, skip =20)
  colnames(tmp) <- c("Date","Unit","Temp")
  stations[[i]] <- tmp
}
names(stations) <- temp # optional; could process file names too like using basename

station1 <- station[[1]] # etc  station1 would be a data.frame

This 2nd part could be improved as well, depending upon how you plan to use the data, and how much of it there is. A good command to know is str(some object). It will really help you understand R's data structures.

Update #2:

Getting individual data frames into your workspace will be quite hard - someone more clever than I may know some tricks. Since you want to plot these, I'd first make names more like you want with:

names(stations) <- paste(basename(temp), 1:length(stations), sep = "_")

Then I would iterate over the list created above as follows, creating your plots as you go:

for (i in 1:length(stations)) {
    tmp <- stations[[i]]
    # tmp is a data frame with columns Date, Unit, Temp
    # plot your data using the plot commands you like to use, for example
    p <- qplot(x = Date, y = Temp, data = tmp, geom = "smooth", main = names(stations)[i])
    print(p)
    # this is approx code, you'll have to play with it, and watch out for Dates
    # I recommend the package lubridate if you have any troubles parsing the dates
    # qplot is in package ggplot2
}

And if you want to save them in a file, use this:

pdf("filename.pdf")
# then the plotting loop just above
dev.off()

A multipage pdf will be created. Good Luck!

edited Jul 10 '13 at 20:22

answered Jul 09 '13 at 21:16

Bryan Hanson

6,055
4
41
78

Isn't this going to rewrite over tmp for every csv they have? I think they want to create an object for each csv. – cylondude Jul 09 '13 at 21:18
Definitely. I assume that the OP might want to `cbind` them as they go, but I don't want to try to read their mind. – Bryan Hanson Jul 09 '13 at 21:19
This is great, thank you! But yes, I would need to create a new object each time. How is cbind used? I am very very new to R and unfamiliar with basic functions still – user2498712 Jul 09 '13 at 21:23
I think the key is how do you want the final data structured? Since there is no station info in the file, I'd guess that you'd want to use the file name as some sort of identifier, and collect all the data into one large piece. The choice might be driven by how you plan to plot this info. – Bryan Hanson Jul 09 '13 at 21:24
Overall, what I am trying to do is upload a set of CSV files that have the data formatted as shown above (the skip is necessary because the first 20 lines of each file is just a header), and then overlay the graphs for Time vs. Temp and find an underlying trend. So I imagine that it's best to keep each CSV as its own data frame. Does this answer your question? – user2498712 Jul 09 '13 at 21:27
I am confused by the last line: station1 <- station[[1]] does this mean that stations[[2]] is the second csv file in the list? and if so, wouldn't I have to write a line line list for every csv file? unless there is a way to make the stations_ (fill underscore with number) a variable name? is that possible? – user2498712 Jul 09 '13 at 22:05
Just to be clear, you want the individual data frames in your work space with names like `station_n`? – Bryan Hanson Jul 09 '13 at 22:43
@BryanHanson, yes I would. Sorry for any confusion! – user2498712 Jul 10 '13 at 19:41

score 1 · Answer 2 · answered Jul 09 '13 at 21:16

1

It is usually not recommended practice to use the 'assign' statement in R. (I should really find some resources on why this is so.)

You can do what you are trying using a function like this:

read.a.file <- function (f, cnames, ...) {
  my.df <- read.csv(f, ...)
  colnames(my.df) <- cnames
  ## Here you can add more preprocessing of your files.
}

And loop over the list of files using this:

lapply(X=temp, FUN=read.a.file, cnames=c("Date", "Unit", "Temp"), skip=20, header=FALSE)

answered Jul 09 '13 at 21:16

asb

4,392
1
20
30

1

For completeness, let's link back to the question you just asked, which the OP here might find useful. http://stackoverflow.com/questions/17559390/why-is-assign-bad – Bryan Hanson Jul 09 '13 at 23:38

score 1 · Answer 3 · answered Jul 09 '13 at 21:19

"read.csv" returns a data.frame so you don't need "as.data.frame" call;
You can use "col.names" argument to "read.csv" to assign column names;
I don't know what version of R you are using, but "colnames(as.data.frame(...)) <-" is just an incorrect call since it calls for "as.data.frame<-" function that does not exist, at least in version 2.14.

eddi · Answer 4 · 2013-07-09T21:32:29.000

A short-term fix to your woes is the following, but you really need to read up more on using R as from what you did above I expect you'll get into another mess very quickly. Maybe start by never using assign.

lapply(list.files(pattern = "*.csv"), function (f) {
  df = read.csv(f, header = F, skip = 20))
  names(df) = c('Date', 'Unit', 'Temp')
  df
}) -> your_list_of_data.frames

Although more likely you want this (edited to preserve file name info):

df = do.call(rbind,
             lapply(list.files(pattern = "*.csv"), function(f)
                    cbind(f, read.csv(f, header = F, skip = 20))))
names(df) = c('Filename', 'Date', 'Unit', 'Temp')

score 0 · Answer 5 · answered Jul 09 '13 at 21:16

0

At a glance it appears that you are missing a set of subset braces, [], around the elements of your temp list. Your attribute list has three elements but because you have temp[i] instead of temp[[i]] the for loop isn't actually accessing the elements of the list thus treating as an element of length one, as the error says.

answered Jul 09 '13 at 21:16

bap

1
1

list.files should create a character vector instead of a list. – cylondude Jul 09 '13 at 21:21

Error with assigning column names to data frame inR

5 Answers5