Lately, David Smith from REvolution Computing set out to challenge the R community with the reprocuction of a beautiful choropleth map (= multiple regions map/thematic map) on US unemployment rates he had seen on the Flowing Data blog. Here you can find the impressing results. Being a fan of beautiful visualizations I tried to produce a similar map for Germany.
1. Getting the spatial country data
The first step resulted in getting data to draw a map of the German administrative districts. Unfortunately, the maps for Germany do not come along in the map package, which would mean I could easily adopt the code results from the challenge. Getting data: The GADM database of Global Administrative Areas has the aim to provide data of administrative districts for the whole world on different levels (country, state and county level). The data can be downloaded as as a shapefile, an ESRI geodatabase file, a Google Earth .kmz file and very convenient for R users, as an Rdata file.
2. Getting socio-demographic data (e. g. unemployment rates by administrative district): A lot of data is available online at www.statistikportal.de. On this site you find links to several data bases. To get the unemployment stats by county I clicked my way through: Regionaldatenbank Deutschland -> Arbeitsmarkt -> Arbeitsmarktstatistik der Bundesagentur für Arbeit -> Arbeitslose nach ausgewählten Personengruppen sowie Arbeitslosenquoten – Jahresdurchschnitt – (ab 2008) regionale Tiefe: Kreise und krfr. Städte -> Werteabruf -> save as CSV format. This table contains all the information I need, although for some reson, for a few districts there is no data listed. I also looked for another source. On Regionalatlas a nice online visualization tool is offered. In the menu I selected unemployment rate 2008 as indicator. Besides the nice visualization you get, there is a menu button “tables” where you can retrieve a html table of the data. I simply copied and pasted it into a .txt file which gives me a tab seperated value format I can read in R. But still: some districts are not listed. Here is a pdf file containing the data. Continue reading ‘Infomaps using R – Visualizing German unemployment rates by district on a map’
Filed under: R / R-Code | 6 Comments
Tags: maps, R, visualization
Most people using LaTex feel that creating tables is no fun. Some days ago I stumbled across a neat function written by Paul Johnson that produces LaTex code as well as LaTex code that can be used within Lyx. The output can be used for regression models and looks like output from the Stata outreg command. His R function that produces the LaTex code has the same name: outreg(). The outreg code can be found on his website or in the PDF copy of the code from his website.
I took the code, put it into a .rnw file and sweaved it. It worked like a charm and produced beautiful results (see the picture on the left and the PDF). Below you can find the code for the noweb file (.rnw). Latex code is colored grey, R-code is colored blue. Just have a look at all the results as a PDF file. Besides, Paul Johnson has also created a nice list of R-Tips that can be found on his website as well.
Continue reading ‘R: Function to create tables in LaTex or Lyx to display regression model results’
Filed under: R / R-Code | 9 Comments
Tags: LaTex, Lyx, regression, sweave, tables
Just a little note for german speaking R beginners: There is an introductory course in R (german) available online on the website of the department of methodology and evaluation research at the University of Jena. Dr. Ivailo Partchev holds a seven sessions course on that topic (duration 11.5 hours).
Filed under: News | 1 Comment
Before you read this post, please have a look at Enrique’s comment below. He pointed out that the built-in R function modifyList() already does what I wanted to describe in this post. Well, I live to learn :)
I was wondering how I could write a function that uses default settings but accepts a list to overwrite the default settings via the dot-dot-dot / three-point argument. Here comes my solution.
# building a function with a list of default settings
# that can be modified by an optional list passed
# via the dot-dot-dot / three point argument
Filed under: R / R-Code | 4 Comments
Tags: building functions, dot-dot-dot
On the REvolutions Blog there is a nice posting treating the often raised concern on “How good or reliable R is”. At my university R is hardly used. Sometimes I was asked by lecturers wether the calculations done by R and its packages are accurate. The linked posting treats this matter and tries to clarify this point.
Filed under: R / R-Code | Leave a Comment
![]()
Sometimes I find it useful to merge two data frames like the following ones
X1 X2 X3 X4 Y1 Y2 Y3 Y4 1 o o o o X X X X 2 o o o o X X X X 3 o o o o X X X X
by using zip feeding either along the columns
X1 Y1 X2 Y2 X3 Y3 X4 Y4 1 o X o X o X o X 2 o X o X o X o X 3 o X o X o X o X
or along the rows of the data frames.
V1 V2 V3 V4 1 o o o o 4 X X X X 2 o o o o 5 X X X X 3 o o o o 6 X X X X
Filed under: R / R-Code | 1 Comment
Tags: combine, data frame, merge, zip fastener
In some statistical programs there is the option available to attach a footnote to the graphical output that is created. This footnote may contain the name of the script or the file that produced the graphic, the author’s name and the date of creation. In SAS for example there is a footnote command to achieve this. Ever since I realized that this makes life a lot easier, I wrote a simple three-lines function in R which I use at the end of the construction of any graphic. I suppose, that this is what my professors meant with “good practice”. The nice thing about implementing this in the grid graphics system is that you can produce multiple graphics [e.g. by par(mfrow=c(2, 2))] and still the footnote will be positioned correctly.
Continue reading ‘R: Good practice – adding footnotes to graphics’
Filed under: R / R-Code | 1 Comment
Tags: graphics, footnote
Although the graphic at the left might not seem a 100% appropriate, it gives a hint to what I am about to do. I want to calculate all possible linear regression models with one dependent and several independent variables. I do not want to address bias and fitting issues or the question if this makes sense from a statistical point of view in this posting. Here I want to emphasize the technical issues only.
To solve the task, several approaches are possible. The first one is a step-by-step approach using a lot of code. Another one would be to make use of a specialized package. The packages leaps and meifly would be appropriate for the task but have some slight drawbacks in terms of flexibility. I will not address solutions using these packages here, but I would like to point out that in contrast to the below only a few lines of code would do the job.
The step-by-step approach
Let’s suppose we have the following set of four possible regressors.
regressors <- c("y1", "y2", "y3", "y4")
Now we want to construct a formula that contains the first and third regressor.
vec <- c(T, F, T, F)
paste(regressors[vec])
> [1] "y2" "y3"
So the paste commmand works vectorwise which helps a lot in this case. Now we add a plus sign between the regressors…
Filed under: R / R-Code | 3 Comments
Tags: permutation, plyr, regression
Today I will treat a problem I encounter every once in a while. Let’s suppose we have several dataframes or vectors of unequel length but with partly matching column names, just like the following ones:
df1 <- data.frame(Intercept = .4, x1=.4, x2=.2, x3=.7) df2 <- data.frame(Intercept= .5, x2=.8 )
This for example may occur when fitting several multiple regression models each time using different combination of regressors. Now I would like to combine the results into one data frame. The merge() as well as the rbind() function do not help here as they require equal lengths.
I posted this matter on r-help as my first solution was somewhat awkward and could not be generalized to any data frames or list of data frames. The first solution was posted by Charles C. Berry. myList is a list containing the data frames as elements
myList <- list(df1, df2)
What he does is to use a nested loop. The inner loop runs for each data frame over each column name. It basically takes each column name and the correponding element [i, j] from the data frame ( myList[[i]] ) and writes it into an empty data frame (dat). Thereby a new column that is named just like the column from the list element data frame is created. The cells that are left out are automatically set NA.
dat <- data.frame()
for(i in seq(along=myList)) for(j in names(myList[[i]]))
dat[i,j] <- myList[[i]][j]
dat
Continue reading ‘R: Combining vectors or data frames of unequal length into one data frame’
Filed under: R / R-Code | 2 Comments
Tags: data frame, plyr
Recent Entries
- Infomaps using R – Visualizing German unemployment rates by district on a map
- R: Function to create tables in LaTex or Lyx to display regression model results
- Getting started with R (for german speakers)
- R: Building functions – using default settings that can be modified via the dot-dot-dot / three point argument
- How accurate or reliable are R calculations?
- R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner
- R: Monitoring the function progress with a progress bar
- R: Good practice – adding footnotes to graphics
- R: Calculating all possible linear regression models for a given set of predictors
- R: Combining vectors or data frames of unequal length into one data frame
- R: Normalized Google distance (NGD) in R part II
Categories
- News (3)
- R / R-Code (14)


All postings