Progress bars in R (part II) – a wrapper for apply functions
Update (2014/08/24): Since October 2013 the pbapply package is available that has picked up the idea of integrating progress bars into the apply family of functions outlined in this post.
In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using apply(). The plyr package provides several apply-like functions also including progress bars, so one could have a look here and use a plyr function instead of apply if possible. Anyway, here comes a wrapper for apply, lapply and sapply that has a progressbar. It seems to work although one known issue is the use of vectors (like c(1,2)with the MARGIN argument in apply_pb(). Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach.
############################################################### # STATUS: WORKING, but only tested once or twice, # tested with most ?apply examples # ISSUES/TODO: MARGIN argument cannot take a # vector like 1:2 that is more than one numeric apply_pb <- function(X, MARGIN, FUN, ...) { env <- environment() pb_Total <- sum(dim(X)[MARGIN]) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...) { curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir= env) setTxtProgressBar(get("pb", envir= env), curVal +1) FUN(...) } res <- apply(X, MARGIN, wrapper, ...) close(pb) res } ## NOT RUN: # apply_pb(anscombe, 2, sd, na.rm=TRUE) ## large dataset # df <- data.frame(rnorm(30000), rnorm(30000)) # apply_pb(df, 1, sd) ############################################################### lapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) # wrapper around FUN wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- lapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000))) # lapply_pb(l, mean) ############################################################### sapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- sapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000)) # sapply_pb(l, mean) ###############################################################
Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.
############################################################### > l <- sapply(1:20000, function(x) list(rnorm(1000))) > system.time(sapply(l, mean)) User System verstrichen 0.474 0.003 0.475 > system.time(sapply_pb(l, mean)) |======================================================| 100% User System verstrichen 1.863 0.025 1.885 > df <- data.frame(rnorm(90000), rnorm(90000)) > system.time(apply(df, 1, sd)) User System verstrichen 7.152 0.062 7.260 > system.time(apply_pb(df, 1, sd)) |======================================================| 100% User System verstrichen 13.112 0.099 13.192 ###############################################################
So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.
Latest version with more comments on github.
Filed under: R / R-Code | 12 Comments
Tags: apply, progress bar
Feeds
Authors
Ecosia – eco-friendly web search
Archives
Blogroll
Links
- apply building functions combine data frame dot-dot-dot fields footnote graphics histogram interactive plot intro jackknife LaTex Lyx maps Matlab merge News NGD normalized google distance permutation playwith plotrix plyr progress bar R R.basic regression rggobi strings sweave tables trellis visualization zip fastener
Categories
Mark Heckmann’s blog
- Productivity: Markdown and R code in emails with Markdown Here
- Introduction to OpenRepGrid at the University of Barcelona
- Biennial EPCA Conference 2014 in Brno
- Werbung mit „harten“ Fakten – und warum man am Ende doch nicht schlauer ist als vorher
- 20th International Congress of Personal Construct Psychology in Sydney
- Erhalt des Berninghausenpreises für hervorragende Lehre
- 11th EPCA Biennial Conference in Dublin
- Karriere vs. Wissenschaft – Zur Verbesserung der Bedingungen für den wissenschaftlichen Nachwuchs
- Psychologie und LaTex – Literaturverzeichnisse im DGPs-Stil
- Zotero to LaTex Workflow
R-help latest postings
- An error has occurred; the feed is probably down. Try again later.
Recent Comments
- shiny: passing reactiveValues to conditionalPanel - Tutorial Guruji on Sending data from client to server and back using shiny
- Multiple Linear Regression (MLR) – Bytes of computer wisdom on R: Calculating all possible linear regression models for a given set of predictors
- Anonymous on Sending data from client to server and back using shiny
Mark on Twitter
Tweets by markheckmannTerms of use
The contents of the "R" you ready? blog are offered under a creative commons license.Meta
Funnily enough I just started playing around with progress bars myself. These wrappers could be handy. I don’t think your performance decrease will be as extreme as you think with any code that’s actually doing some processing within the apply. For example:
> system.time(sapply(1:100,function(x){for(i in 1:100)rnorm(10000);return(1)}))
user system elapsed
14.650 0.000 14.727
> system.time(sapply_pb(1:100,function(x){for(i in 1:100)rnorm(10000);return(1)}))
user system elapsed
14.61 0.01 14.62
Minor suggestions, I wonder if you should
close(pb)
at the end of your functions? And in your post you have a “(” instead of a “{” when you open the body of the sapply_pb function. Thanks again.Hi Scott,
thanks for the suggestions. You are absolutely right, I forgot to close the progress bar.
I changed the code above. And your performance results look much better. That’s nice! :)
Mark
Hi,
I wanted to have a lightweight package (it is now called pbapply and can be downloaded from CRAN) without dependencies, that allows easy modification of the progress bar type, can be used within for/while loops, and natural to write *apply functions. (My original intent was to have a progress bar for bootstrap-like calculations.) Below is a comparison of Hadley Wickham’s plyr, Mark’s wrapper and my pbapply approach.
library(plyr)
library(pbapply)
lapply_pb <- function(X, FUN, …)
{
env <- environment()
pb_Total <- length(X)
counter <- 0
pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
# wrapper around FUN
wrapper <- function(…){
curVal <- get("counter", envir = env)
assign("counter", curVal +1 ,envir=env)
setTxtProgressBar(get("pb", envir=env), curVal +1)
FUN(…)
}
lapply(X, wrapper, …)
close(pb)
}
system.time(x1 <- lapply(1:10, function(i) Sys.sleep(0.2)))
system.time(x1 <- lapply_pb(1:10, function(i) Sys.sleep(0.2)))
system.time(x1 <- l_ply(1:10, function(i) Sys.sleep(0.2), .progress=create_progress_bar(name = "text")))
system.time(x1 <- pblapply(1:10, function(i) Sys.sleep(0.2)))
Hi Peter,
thanks for the hint :)
Cheers,
Mark
Hello
I’m using Revolution R for students.
None of this methods show any bar on this GUI.
What option should I change?
What other solutions exist?
Hi,
I am afraid I cannot help you with this one as I have never used Revolution R.
Mark
First, I would like to thank the authors for the code examples. But beware of a small issue which is easily fixed.
Beware that the functions above do not return any values. I, for example, use lapply for reading mulitple files.
#############
files <- list.files(datadir, recursive=T, full.names=T)
datalist <- lapply(files, read.table, sep=",", dec=".", header=T, as.is=T)
datatable <- do.call(rbind, datalist)
#############
This example doesn't work using lapply_pb. However, the solution is easy. Instead of finishing the function lapply_pb with:
lapply(X, wrapper, …)
close(pb)
use:
result <- lapply(X, wrapper, …)
close(pb)
return(result)
This corresponds more closely with the real lapply. Of course the same can be done for apply and sapply.
Hi Anton,
thanks for the hint. I updated it.
Mark
I introduce some changes to the markhackman code to display a progress bar including a title. I thint this is only working on windonws, however it could be useful
# STATUS: WORKING, but only tested once or twice,
# tested with most ?apply examples
# ISSUES/TODO: MARGIN argument cannot take a
# vector like 1:2 that is more than one numeric
apply_pb <- function(TheTitle=NULL,X, MARGIN, FUN, …)
{
env <- environment()
pb_Total <- sum(dim(X)[MARGIN])
counter <- 0
# pb <- txtProgressBar(min = 0, max = pb_Total,style = 3)
pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)
wrapper <- function(…)
{
curVal <- get("counter", envir = env)
assign("counter", curVal +1 ,envir= env)
# setTxtProgressBar(get("pb", envir= env),curVal +1)
setWinProgressBar(get("pb", envir= env),curVal +1)
FUN(…)
}
res <- apply(X, MARGIN, wrapper, …)
close(pb)
res
}
## NOT RUN:
# apply_pb(anscombe, 2, sd, na.rm=TRUE)
## large dataset
# df <- data.frame(rnorm(30000), rnorm(30000))
# apply_pb(df, 1, sd)
###############################################################
lapply_pb <- function(TheTitle=NULL,X, FUN, …)
{
env <- environment()
pb_Total <- length(X)
counter <- 0
# pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)
# wrapper around FUN
wrapper <- function(…){
curVal <- get("counter", envir = env)
assign("counter", curVal +1 ,envir=env)
# setTxtProgressBar(get("pb", envir=env), curVal +1)
setWinProgressBar(get("pb", envir= env),curVal +1)
FUN(…)
}
res <- lapply(X, wrapper, …)
close(pb)
res
}
## NOT RUN:
# l <- sapply(1:20000, function(x) list(rnorm(1000)))
# lapply_pb(l, mean)
###############################################################
sapply_pb <- function(TheTitle=NULL,X, FUN, …)
{
env <- environment()
pb_Total <- length(X)
counter <- 0
# pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)
wrapper <- function(…){
curVal <- get("counter", envir = env)
assign("counter", curVal +1 ,envir=env)
# setTxtProgressBar(get("pb", envir=env), curVal +1)
setWinProgressBar(get("pb", envir= env),curVal +1)
FUN(…)
}
res <- sapply(X, wrapper, …)
close(pb)
res
}
ilove
I include a by_pb to complete the set ….
###########################################################
by_pb <- function(TheTitle=NULL,Data,INDICES,FUN, …)
{
env <- environment()
pb_Total <- length(unique(INDICES))
counter <- 0
# pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)
pb <- winProgressBar(title=TheTitle, min = 0, max = pb_Total, width = 300)
wrapper <- function(…){
curVal <- get("counter", envir = env)
assign("counter", curVal +1 ,envir=env)
# setTxtProgressBar(get("pb", envir=env), curVal +1)
setWinProgressBar(get("pb", envir= env),curVal +1)
FUN(…)
}
res <- by(Data,INDICES,wrapper,…)
close(pb)
res
}
###########################################################