R: Combining vectors or data frames of unequal length into one data frame
Today I will treat a problem I encounter every once in a while. Let’s suppose we have several dataframes or vectors of unequel length but with partly matching column names, just like the following ones:
This for example may occur when fitting several multiple regression models each time using different combination of regressors. Now I would like to combine the results into one data frame. The merge() as well as the rbind() function do not help here as they require equal lengths.
I posted this matter on r-help as my first solution was somewhat awkward and could not be generalized to any data frames or list of data frames. The first solution was posted by Charles C. Berry. myList is a list containing the data frames as elements
myList <- list(df1, df2)
What he does is to use a nested loop. The inner loop runs for each data frame over each column name. It basically takes each column name and the correponding element [i, j] from the data frame ( myList[[i]] ) and writes it into an empty data frame (dat). Thereby a new column that is named just like the column from the list element data frame is created. The cells that are left out are automatically set NA.
dat <- data.frame() for(i in seq(along=myList)) for(j in names(myList[[i]])) dat[i,j] <- myList[[i]][j] dat
Note that the order of the output columns depends on the input order. The list below renders a different order, though it contains the same elements but ordered differently.
myList <- list(df2, df1) Intercept x2 x1 x3 1 0.5 0.8 NA NA 2 0.4 0.2 0.4 0.7
Another solution was posted by Henrique Dallazuanna. This one has the advantage that it does not use loops.
l <- myList do.call(rbind, lapply(lapply(l, unlist), "[", unique(unlist(c(sapply(l,names))))))
It looks a bit scary at first, so let's examine it starting from the inside.
# a list of names from each list element c(sapply(l,names)) # unlist them and find unique names unique(unlist(c(sapply(l,names)))) # gives unlisted vectors with column names for each list element lapply(l, unlist)
As a next step for each vector with column names all columns are selected leaving those that are not present with NA values.
listOfVectors <- lapply(lapply(l, unlist), "[", unique(unlist(c(sapply(l,names)))))
As a last step the vectors having the same columns are combined.
do.call(rbind, listOfVectors) # or in full DF <- do.call(rbind, lapply(lapply(l, unlist), "[", unique(unlist(c(sapply(l,names))))))
The only little flaw in this function is that the column names of the first vector are taken as column names of the developing data frame. Using the second list from above, gives the following.
l <- list(df2, df1) Intercept x2 <NA> <NA> [1,] 0.5 0.8 NA NA [2,] 0.4 0.2 0.4 0.7
Thus, in a last step we need change the column names of the data frame.
DF <- as.data.frame(DF) names(DF) <- unique(unlist(c(sapply(l,names)))) DF
Well this works but it would be much more convenient to get this done in one single function and well, since october 2008 there is one. It can be found in the plyr package written by Hadley Wickham. So the solution is as easy as:
library(plyr) l <- myList do.call(rbind.fill, l) # another example l <- list(data.frame(a=1, b=2), data.frame(a=2, c=3, d=5)) do.call(rbind.fill, l)
Intercept x1 x2 x3 1 0.4 0.4 0.2 0.7 2 0.5 NA 0.8 NA
Now, this is nice! It is really worthwhile having a look at Hadley Wickhams plyr package as it provides a lot of functions that make life a lot easier when it comes to splitting list or data frames, doing a calculation or not and merge them afterwards again. More on that another day.
Filed under: R / R-Code | 10 Comments
Tags: data frame, plyr