R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner

27Mar09

zipperzippersSometimes I find it useful to merge two data frames like the following ones

  X1 X2 X3 X4      Y1 Y2 Y3 Y4   
1  o  o  o  o       X  X  X  X
2  o  o  o  o       X  X  X  X
3  o  o  o  o       X  X  X  X

by using zip feeding either along the columns

   X1 Y1 X2 Y2 X3 Y3 X4 Y4
1  o  X  o  X  o  X  o  X
2  o  X  o  X  o  X  o  X
3  o  X  o  X  o  X  o  X

or along the rows of the data frames.

  V1 V2 V3 V4
1  o  o  o  o
4  X  X  X  X
2  o  o  o  o
5  X  X  X  X
3  o  o  o  o
6  X  X  X  X

The following function acts like a “zip fastener” for combining two dataframes. It takes the first column (or row) of the first data frame and places it next to the first column (or row) of the second data frame and so on. Only one dimension of the data frame has to be equal to do this. E.g. to combine the columns by zip feeding the number of rows must be equal and vice versa.

So here comes the code for the zipFastener() function. Actually its only the last few lines (from #zip fastener operations on) that do the job, but as I did not want to restrict the function to equal dimensions there is a little prelude.

###############################################################

# zipFastener for TWO dataframes of unequal length
zipFastener <- function(df1, df2, along=2)
{
    # parameter checking
    if(!is.element(along, c(1,2))){
        stop("along must be 1 or 2 for rows and columns
                                              respectively")
    }
    # if merged by using zip feeding along the columns, the
    # same no. of rows is required and vice versa
    if(along==1 & (ncol(df1)!= ncol(df2))) {
        stop ("the no. of columns has to be equal to merge
               them by zip feeding")
    }
    if(along==2 & (nrow(df1)!= nrow(df2))) {
        stop ("the no. of rows has to be equal to merge them by
               zip feeding")
    }

    # zip fastener preperations
    d1 <- dim(df1)[along]
    d2 <- dim(df2)[along]
    i1 <- 1:d1           # index vector 1
    i2 <- 1:d2 + d1      # index vector 2

    # set biggest dimension dMax
    if(d1==d2) {
        dMax <- d1
    } else if (d1 > d2) {
        length(i2) <- length(i1)    # make vectors same length, 
        dMax <- d1                  # fill blanks with NAs   
    } else  if(d1 < d2){
        length(i1) <- length(i2)    # make vectors same length,
        dMax <- d2                  # fill blanks with NAs   
    }
    
    # zip fastener operations
    index <- as.vector(matrix(c(i1, i2), ncol=dMax, byrow=T))
    index <- index[!is.na(index)]         # remove NAs
    
    if(along==1){
        colnames(df2) <- colnames(df1)   # keep 1st colnames                  
        res <- rbind(df1,df2)[ index, ]  # reorder data frame
    }
    if(along==2) res <- cbind(df1,df2)[ , index]           

    return(res)
}

###############################################################

Here come some examples.

###############################################################
### examples ###
require(plyr)

# data frames equal dimensions
df1 <- rdply(3, rep("o",4))[ ,-1]       # from plyr package
df2 <- rdply(3, rep("X",4))[ ,-1]       

zipFastener(df1, df2)
zipFastener(df1, df2, 2)
zipFastener(df1, df2, 1)

# data frames unequal in no. of rows
df1 <- rdply(10, rep("o",4))[ ,-1]
zipFastener(df1, df2, 1)
zipFastener(df2, df1, 1)

# data frames unequal in no. of columns
df2 <- rdply(10, rep("X",3))[ ,-1]
zipFastener(df1, df2)
zipFastener(df2, df1, 2)

###############################################################

I hope you find that useful.

Ciao, Mark

About these ads


5 Responses to “R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner”

  1. Footnote:
    I just discovered the interleave() function from the gdata package which basically does the same (except that it works on rows only). Thats what learning is like: Spending time producing things that already exist ;)

  2. 2 Emmanuel

    Seeing what you have done with the zip, i feel you may be able to help me.

    I have two data-frames one without NAs (e.g df1) and another with NAs (e.g df2). I want to put the values in df1 into df2. I have tried the match(..) function but it is very slow. my real data frame is very huge. 300,000 x4. I thought about using rbind but this would not work because i want each data to go to the corresponding row in df2 (replace the Nas) but i also need the rest of the NAs in the other rows because the result is used in a mapping software and each value (even NA) describes a location. How do i do this?
    df1
    var1 var2
    ["1"] 3 5
    ["3"] 5 10

    df2
    var1 var2
    ["1] NA NA
    ["2"] NA NA
    ["3"] NA NA
    ["4"] NA NA
    ["5"] NA NA

  3. 3 Emmanuel

    figured it out. Will use merge (…,by=”row.names”) and then delete the columns from the df2 after the merge. still very slow!

  4. Thanks Mark Heckmann! Even when there are another function that can do this this excersice is very helpfull for me. I’ll try with interleave() function, also.

  5. 5 KP

    Hello, how do you use the function for three dataframes – to alternate column1[,1], column2[,1], column3[,1],column1[,2], column2[,2], column3[,2]

    etc…

    Thank you!



Follow

Get every new post delivered to your Inbox.

Join 48 other followers

%d bloggers like this: