regression_models_in_latexMost people using LaTex feel that creating tables is no fun. Some days ago I stumbled across a neat function written by Paul Johnson that produces LaTex code as well as LaTex code that can be used within Lyx. The output can be used for regression models and looks like output from the Stata outreg command. His R function that produces the LaTex code has the same name:  outreg(). The outreg code can be found on his website or in the PDF copy of the code from his website.

I took the code, put it into a .rnw file and sweaved it. It worked like a charm and produced beautiful results (see the picture on the left and the PDF). Below you can find the code for the noweb file (.rnw). Latex code is colored grey, R-code is colored blue. Just have a look at all the results as a PDF file. Besides, Paul Johnson has also created a nice list of R-Tips that can be found on his website as well.

Continue reading ‘R: Function to create tables in LaTex or Lyx to display regression model results’


Just a little note for german speaking R beginners: There is an introductory course in R (german) available online on the website of the department of methodology and evaluation research at the University of Jena. Dr. Ivailo Partchev holds a seven sessions course on that topic (duration 11.5 hours).


Before you read this post, please have a look at Enrique’s comment below. He pointed out that the built-in R function modifyList() already does what I wanted to describe in this post. Well, I live to learn :)

I was wondering how I could write a function that uses default settings but accepts a list to overwrite the default settings via the dot-dot-dot / three-point argument. Here comes my solution.

# building a function with a list of default settings
# that can be modified by an optional list passed
# via the dot-dot-dot / three point argument

Continue reading ‘R: Building functions – using default settings that can be modified via the dot-dot-dot / three point argument’


On the REvolutions Blog there is a nice posting treating the often raised concern on “How good or reliable R is”. At my university R is hardly used. Sometimes I was asked by lecturers wether the calculations done by R and its packages are accurate. The linked posting treats this matter and tries to clarify this point.


zipperzippersSometimes I find it useful to merge two data frames like the following ones

  X1 X2 X3 X4      Y1 Y2 Y3 Y4   
1  o  o  o  o       X  X  X  X
2  o  o  o  o       X  X  X  X
3  o  o  o  o       X  X  X  X

by using zip feeding either along the columns

   X1 Y1 X2 Y2 X3 Y3 X4 Y4
1  o  X  o  X  o  X  o  X
2  o  X  o  X  o  X  o  X
3  o  X  o  X  o  X  o  X

or along the rows of the data frames.

  V1 V2 V3 V4
1  o  o  o  o
4  X  X  X  X
2  o  o  o  o
5  X  X  X  X
3  o  o  o  o
6  X  X  X  X

Continue reading ‘R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner’


progress_barEvery once in while I have to write a function that contains a loop doing thousands or millions of calculations. To make sure that the function does not get stuck in an endless loop or just to fulfill the human need of control it is useful to monitor the progress. So  first I tried the following:



###############################################################

total <- 10
for(i in 1:total){
   print(i)
   Sys.sleep(0.1)
}

###############################################################

Unfortunately this does not work as the console output to the basic R GUI is buffered. This means that it is printed to the console at once after the loop is finished. The R FAQs (7.1) explains a solution: Either to change the R GUI buffering settings in the Misc menu which can be toggled via <Ctrl-W> or to tell R explicitly to empty the buffer by flush.console(). So like this it works:

###############################################################

total <- 20
for(i in 1:total){
   Sys.sleep(0.1)
   print(i)
   # update GUI console
   flush.console()                          
}

###############################################################

Of course it would be even nicer to have a real progress bar. For different progress bars we can use the built-in R.utils package. First a text based progress bar:

Continue reading ‘R: Monitoring the function progress with a progress bar’


footnoteIn some statistical programs there is the option available to attach a footnote to the graphical output that is created. This footnote may contain the name of the script or the file that produced the graphic, the author’s name and the date of creation. In SAS for example there is a footnote command to achieve this. Ever since I realized that this makes life a lot easier, I wrote a simple three-lines function in R which I use at the end of the construction of any graphic. I suppose, that this is what my professors meant with “good practice”. The nice thing about implementing this in the grid graphics system is that you can produce multiple graphics [e.g. by par(mfrow=c(2, 2))] and still the footnote will be positioned correctly.

Continue reading ‘R: Good practice – adding footnotes to graphics’


variationsAlthough the graphic at the left might not seem a 100% appropriate, it gives a hint to what I am about to do. I want to calculate all possible linear regression models with one dependent and several independent variables. I do not want to address bias and fitting issues or the question if this makes sense from a statistical point of view in this posting. Here I want to emphasize the technical issues only.

To solve the task, several approaches are possible. The first one is a step-by-step approach using a lot of code. Another one would be to make use of a specialized package. The packages leaps and meifly would be appropriate for the task but have some slight drawbacks in terms of flexibility. I will not address solutions using these packages here, but I would like to point out that in contrast to the below only a few lines of code would do the job.

The step-by-step approach

Let’s suppose we have the following set of four possible regressors.

regressors <- c("y1", "y2", "y3", "y4")

Now we want to construct a formula that contains the first and third regressor.

vec <- c(T, F, T, F)
paste(regressors[vec])
> [1] "y2" "y3"

So the paste commmand works vectorwise which helps a lot in this case. Now we add a plus sign between the regressors…

Continue reading ‘R: Calculating all possible linear regression models for a given set of predictors’


Today I will treat a problem I encounter every once in a while. Let’s suppose we have several dataframes or vectors of unequel length but with partly matching column names,  just like the following ones:

df1 <- data.frame(Intercept = .4, x1=.4, x2=.2, x3=.7)
df2 <- data.frame(Interceptlego = .5,        x2=.8       )

This for example may occur when fitting several multiple regression models each time using different combination of regressors. Now I would like to combine the results into one data frame.  The merge() as well as the rbind() function do not help here as they require equal lengths.

I posted this matter on r-help as my first solution was somewhat awkward and could not be generalized to any data frames or list of data frames. The first solution was posted by Charles C. Berry. myList is a list containing the data frames as elements

myList <- list(df1, df2)

What he does is to use a nested loop. The inner loop runs for each data frame over each column name. It basically takes each column name and the correponding element [i, j] from the data frame ( myList[[i]] ) and writes it into an empty data frame (dat). Thereby a new column that is named just like the column from the list element data frame is created. The cells that are left out are automatically set NA.

dat <- data.frame()
for(i in seq(along=myList)) for(j in names(myList[[i]]))
                                 dat[i,j] <- myList[[i]][j]
dat

Continue reading ‘R: Combining vectors or data frames of unequal length into one data frame’


brainAfter my last posting on how to extract the google number count I was searching the web and found a nice website allowing you to calculate many semantic relatedness measures. On request it seems to be possible to get free access to their API. The API allows you to post a request via the GET or POST method which can be implemented in R using the RCurl package.

Anyway, I will post the code to do the normalized google distance (NGD) calculation using R only. As last time the code for the google count extration implemented in R was posted as a first step, here comes the second step, the calculation, using the function described last time.

The calculation formula might look a bit scary at a first glance:

google_distance_formula1

Looking at its step-by-step development in the article Automatic Meaning Discovery Using Google it gets quite easy to understand the rationale behind it. What we need to know here is that M is the total number of web pages searched by Google.  f(x) and f(y) are the counts for search terms x and y, respectively. f(xy) is the number of web pages found on which both x and y occur (also see Wikipedia). So the ingredients are clear. Here comes the function.

Continue reading ‘R: Normalized Google distance (NGD) in R part II’