R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner

Sometimes I find it useful to merge two data frames like the following ones
X1 X2 X3 X4 Y1 Y2 Y3 Y4 1 o o o o X X X X 2 o o o o X X X X 3 o o o o X X X X
by using zip feeding either along the columns
X1 Y1 X2 Y2 X3 Y3 X4 Y4 1 o X o X o X o X 2 o X o X o X o X 3 o X o X o X o X
or along the rows of the data frames.
V1 V2 V3 V4 1 o o o o 4 X X X X 2 o o o o 5 X X X X 3 o o o o 6 X X X X
The following function acts like a “zip fastener” for combining two dataframes. It takes the first column (or row) of the first data frame and places it next to the first column (or row) of the second data frame and so on. Only one dimension of the data frame has to be equal to do this. E.g. to combine the columns by zip feeding the number of rows must be equal and vice versa.
So here comes the code for the zipFastener() function. Actually its only the last few lines (from #zip fastener operations on) that do the job, but as I did not want to restrict the function to equal dimensions there is a little prelude.
############################################################### # zipFastener for TWO dataframes of unequal length zipFastener <- function(df1, df2, along=2) { # parameter checking if(!is.element(along, c(1,2))){ stop("along must be 1 or 2 for rows and columns respectively") } # if merged by using zip feeding along the columns, the # same no. of rows is required and vice versa if(along==1 & (ncol(df1)!= ncol(df2))) { stop ("the no. of columns has to be equal to merge them by zip feeding") } if(along==2 & (nrow(df1)!= nrow(df2))) { stop ("the no. of rows has to be equal to merge them by zip feeding") } # zip fastener preperations d1 <- dim(df1)[along] d2 <- dim(df2)[along] i1 <- 1:d1 # index vector 1 i2 <- 1:d2 + d1 # index vector 2 # set biggest dimension dMax if(d1==d2) { dMax <- d1 } else if (d1 > d2) { length(i2) <- length(i1) # make vectors same length, dMax <- d1 # fill blanks with NAs } else if(d1 < d2){ length(i1) <- length(i2) # make vectors same length, dMax <- d2 # fill blanks with NAs } # zip fastener operations index <- as.vector(matrix(c(i1, i2), ncol=dMax, byrow=T)) index <- index[!is.na(index)] # remove NAs if(along==1){ colnames(df2) <- colnames(df1) # keep 1st colnames res <- rbind(df1,df2)[ index, ] # reorder data frame } if(along==2) res <- cbind(df1,df2)[ , index] return(res) } ###############################################################
Here come some examples.
###############################################################
### examples ### require(plyr) # data frames equal dimensions df1 <- rdply(3, rep("o",4))[ ,-1] # from plyr package df2 <- rdply(3, rep("X",4))[ ,-1] zipFastener(df1, df2) zipFastener(df1, df2, 2) zipFastener(df1, df2, 1) # data frames unequal in no. of rows df1 <- rdply(10, rep("o",4))[ ,-1] zipFastener(df1, df2, 1) zipFastener(df2, df1, 1) # data frames unequal in no. of columns df2 <- rdply(10, rep("X",3))[ ,-1] zipFastener(df1, df2) zipFastener(df2, df1, 2) ###############################################################
I hope you find that useful.
Ciao, Mark
Filed under: R / R-Code | 5 Comments
Tags: data frame, zip fastener, merge, combine
All postings

Footnote:
I just discovered the interleave() function from the gdata package which basically does the same (except that it works on rows only). Thats what learning is like: Spending time producing things that already exist ;)
Seeing what you have done with the zip, i feel you may be able to help me.
I have two data-frames one without NAs (e.g df1) and another with NAs (e.g df2). I want to put the values in df1 into df2. I have tried the match(..) function but it is very slow. my real data frame is very huge. 300,000 x4. I thought about using rbind but this would not work because i want each data to go to the corresponding row in df2 (replace the Nas) but i also need the rest of the NAs in the other rows because the result is used in a mapping software and each value (even NA) describes a location. How do i do this?
df1
var1 var2
["1"] 3 5
["3"] 5 10
df2
var1 var2
["1] NA NA
["2"] NA NA
["3"] NA NA
["4"] NA NA
["5"] NA NA
figured it out. Will use merge (…,by=”row.names”) and then delete the columns from the df2 after the merge. still very slow!
Thanks Mark Heckmann! Even when there are another function that can do this this excersice is very helpfull for me. I’ll try with interleave() function, also.
Hello, how do you use the function for three dataframes – to alternate column1[,1], column2[,1], column3[,1],column1[,2], column2[,2], column3[,2]
etc…
Thank you!