Progress bars in R (part II) – a wrapper for apply functions

11Jan10

Update (2014/08/24): Since October 2013 the pbapply package is available that has picked up the idea of integrating progress bars into the apply family of functions outlined in this post.

In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using apply(). The plyr package provides several apply-like functions also including progress bars, so one could have a look here and use a plyr function instead of apply if possible. Anyway, here comes a wrapper for apply, lapply and sapply that has a progressbar. It seems to work although one known issue is the use of vectors (like c(1,2)with the MARGIN argument in apply_pb(). Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach.

###############################################################

# STATUS: WORKING, but only tested once or twice,
# tested with most ?apply examples
# ISSUES/TODO: MARGIN argument cannot take a
# vector like 1:2 that is more than one numeric

apply_pb <- function(X, MARGIN, FUN, ...)
{
  env <- environment()
  pb_Total <- sum(dim(X)[MARGIN])
  counter <- 0
  pb <- txtProgressBar(min = 0, max = pb_Total,
                       style = 3)

  wrapper <- function(...)
  {
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir= env)
    setTxtProgressBar(get("pb", envir= env),
                           curVal +1)
    FUN(...)
  }
  res <- apply(X, MARGIN, wrapper, ...)
  close(pb)
  res
}

## NOT RUN:
# apply_pb(anscombe, 2, sd, na.rm=TRUE)

## large dataset
# df <- data.frame(rnorm(30000), rnorm(30000))
# apply_pb(df, 1, sd)

###############################################################

lapply_pb <- function(X, FUN, ...)
{
 env <- environment()
 pb_Total <- length(X)
 counter <- 0
 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)   

 # wrapper around FUN
 wrapper <- function(...){
   curVal <- get("counter", envir = env)
   assign("counter", curVal +1 ,envir=env)
   setTxtProgressBar(get("pb", envir=env), curVal +1)
   FUN(...)
 }
 res <- lapply(X, wrapper, ...)
 close(pb)
 res
}

## NOT RUN:
# l <- sapply(1:20000, function(x) list(rnorm(1000)))
# lapply_pb(l, mean)

###############################################################

sapply_pb <- function(X, FUN, ...)
{
  env <- environment()
  pb_Total <- length(X)
  counter <- 0
  pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)

  wrapper <- function(...){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    setTxtProgressBar(get("pb", envir=env), curVal +1)
    FUN(...)
  }
  res <- sapply(X, wrapper, ...)
  close(pb)
  res
}

## NOT RUN:
# l <- sapply(1:20000, function(x) list(rnorm(1000))
# sapply_pb(l, mean)

###############################################################

Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.

###############################################################

> l <- sapply(1:20000, function(x) list(rnorm(1000)))
> system.time(sapply(l, mean))
User      System    verstrichen
0.474       0.003       0.475
> system.time(sapply_pb(l, mean))
|======================================================| 100%
User      System    verstrichen
1.863       0.025       1.885

> df <- data.frame(rnorm(90000), rnorm(90000))
> system.time(apply(df, 1, sd))
User      System verstrichen
7.152       0.062       7.260
> system.time(apply_pb(df, 1, sd))
|======================================================| 100%
User      System     verstrichen
13.112       0.099      13.192

###############################################################

So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.

Latest version with more comments on github.

About these ads


12 Responses to “Progress bars in R (part II) – a wrapper for apply functions”

  1. Funnily enough I just started playing around with progress bars myself. These wrappers could be handy. I don’t think your performance decrease will be as extreme as you think with any code that’s actually doing some processing within the apply. For example:

    > system.time(sapply(1:100,function(x){for(i in 1:100)rnorm(10000);return(1)}))

    user system elapsed
    14.650 0.000 14.727
    > system.time(sapply_pb(1:100,function(x){for(i in 1:100)rnorm(10000);return(1)}))

    user system elapsed
    14.61 0.01 14.62

    Minor suggestions, I wonder if you should close(pb) at the end of your functions? And in your post you have a “(” instead of a “{” when you open the body of the sapply_pb function. Thanks again.

  2. 2 markheckmann

    Hi Scott,
    thanks for the suggestions. You are absolutely right, I forgot to close the progress bar.
    I changed the code above. And your performance results look much better. That’s nice! :)
    Mark

  3. Hi,
    I wanted to have a lightweight package (it is now called pbapply and can be downloaded from CRAN) without dependencies, that allows easy modification of the progress bar type, can be used within for/while loops, and natural to write *apply functions. (My original intent was to have a progress bar for bootstrap-like calculations.) Below is a comparison of Hadley Wickham’s plyr, Mark’s wrapper and my pbapply approach.

    library(plyr)
    library(pbapply)
    lapply_pb <- function(X, FUN, …)
    {
    env <- environment()
    pb_Total <- length(X)
    counter <- 0
    pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)

    # wrapper around FUN
    wrapper <- function(…){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    setTxtProgressBar(get("pb", envir=env), curVal +1)
    FUN(…)
    }
    lapply(X, wrapper, …)
    close(pb)
    }
    system.time(x1 <- lapply(1:10, function(i) Sys.sleep(0.2)))
    system.time(x1 <- lapply_pb(1:10, function(i) Sys.sleep(0.2)))
    system.time(x1 <- l_ply(1:10, function(i) Sys.sleep(0.2), .progress=create_progress_bar(name = "text")))
    system.time(x1 <- pblapply(1:10, function(i) Sys.sleep(0.2)))

  4. 4 markheckmann

    Hi Peter,
    thanks for the hint :)
    Cheers,
    Mark

  5. 5 skan

    Hello

    I’m using Revolution R for students.
    None of this methods show any bar on this GUI.
    What option should I change?
    What other solutions exist?

  6. 6 markheckmann

    Hi,
    I am afraid I cannot help you with this one as I have never used Revolution R.
    Mark

  7. 7 Anton

    First, I would like to thank the authors for the code examples. But beware of a small issue which is easily fixed.

    Beware that the functions above do not return any values. I, for example, use lapply for reading mulitple files.

    #############
    files <- list.files(datadir, recursive=T, full.names=T)

    datalist <- lapply(files, read.table, sep=",", dec=".", header=T, as.is=T)

    datatable <- do.call(rbind, datalist)
    #############

    This example doesn't work using lapply_pb. However, the solution is easy. Instead of finishing the function lapply_pb with:

    lapply(X, wrapper, …)
    close(pb)

    use:

    result <- lapply(X, wrapper, …)
    close(pb)
    return(result)

    This corresponds more closely with the real lapply. Of course the same can be done for apply and sapply.

  8. 8 markheckmann

    Hi Anton,
    thanks for the hint. I updated it.
    Mark

  9. 9 Julio

    I introduce some changes to the markhackman code to display a progress bar including a title. I thint this is only working on windonws, however it could be useful

    # STATUS: WORKING, but only tested once or twice,
    # tested with most ?apply examples
    # ISSUES/TODO: MARGIN argument cannot take a
    # vector like 1:2 that is more than one numeric

    apply_pb <- function(TheTitle=NULL,X, MARGIN, FUN, …)
    {
    env <- environment()
    pb_Total <- sum(dim(X)[MARGIN])
    counter <- 0
    # pb <- txtProgressBar(min = 0, max = pb_Total,style = 3)
    pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)

    wrapper <- function(…)
    {
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir= env)
    # setTxtProgressBar(get("pb", envir= env),curVal +1)
    setWinProgressBar(get("pb", envir= env),curVal +1)
    FUN(…)
    }
    res <- apply(X, MARGIN, wrapper, …)
    close(pb)
    res
    }

    ## NOT RUN:
    # apply_pb(anscombe, 2, sd, na.rm=TRUE)

    ## large dataset
    # df <- data.frame(rnorm(30000), rnorm(30000))
    # apply_pb(df, 1, sd)

    ###############################################################

    lapply_pb <- function(TheTitle=NULL,X, FUN, …)
    {
    env <- environment()
    pb_Total <- length(X)
    counter <- 0
    # pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
    pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)

    # wrapper around FUN
    wrapper <- function(…){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    # setTxtProgressBar(get("pb", envir=env), curVal +1)
    setWinProgressBar(get("pb", envir= env),curVal +1)
    FUN(…)
    }
    res <- lapply(X, wrapper, …)
    close(pb)
    res
    }

    ## NOT RUN:
    # l <- sapply(1:20000, function(x) list(rnorm(1000)))
    # lapply_pb(l, mean)

    ###############################################################

    sapply_pb <- function(TheTitle=NULL,X, FUN, …)
    {
    env <- environment()
    pb_Total <- length(X)
    counter <- 0
    # pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
    pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)

    wrapper <- function(…){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    # setTxtProgressBar(get("pb", envir=env), curVal +1)
    setWinProgressBar(get("pb", envir= env),curVal +1)
    FUN(…)
    }
    res <- sapply(X, wrapper, …)
    close(pb)
    res
    }

  10. ilove

  11. 11 Julio

    I include a by_pb to complete the set ….

    ###########################################################

    by_pb <- function(TheTitle=NULL,Data,INDICES,FUN, …)
    {
    env <- environment()
    pb_Total <- length(unique(INDICES))
    counter <- 0
    # pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
    pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)

    wrapper <- function(…){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    # setTxtProgressBar(get("pb", envir=env), curVal +1)
    setWinProgressBar(get("pb", envir= env),curVal +1)
    FUN(…)
    }
    res <- by(Data,INDICES,wrapper,…)
    close(pb)
    res
    }
    ###########################################################


  1. 1 Handy Apply-based R Progress Bars :: Dammit Jim!

Follow

Get every new post delivered to your Inbox.

Join 59 other followers

%d bloggers like this: