Thursday, July 23, 2015

Tips

[1] Benchmark the performance

proc.time determines how much real and CPU time (in seconds) the currently running R process has already taken.
system.time for timing an R expression.
gc.time for how much of the time was spent in garbage collection.
(https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.time.html)


ptm <- proc.time()
for (i in 1:50) mad(stats::runif(500))
proc.time() - ptm


Rprof()
summaryRprof()



[2] Parallel Processing

I am currently using MALDIquant, analyzing Mass Spectrometry data. This package has a few multi-threaded methods, but only supported on Unix systems.
(https://cran.r-project.org/web/packages/MALDIquant/MALDIquant.pdf)

 MALDIquant offers multi-core support using mclapply and mcmapply.

## load package
library("MALDIquant")
## load example data
data("fiedler2009subset", package="MALDIquant")
## run single-core baseline correction
print(system.time(
b1 <- removeBaseline(fiedler2009subset, method="SNIP")
))

if(.Platform$OS.type == "unix") 
{
## run multi-core baseline correction
print(system.time(
b2 <- removeBaseline(fiedler2009subset, method="SNIP", mc.cores=2)
))
print(all.equal(b1, b2))
}

I use detectPeaks method as an example in MSDA toolbox.

[3] Some observations

Noted that, using  apply() function, such as lapply() , mapply(), doesn't mean your code will run faster.