Posted by: nealc | July 21, 2010

Learning to Plot in R

R has been something I’ve been trying to learn for atleast a year now. It’s not easy. So here I’m going to keep some notes on what and how I’ve learnt so far.

I’m not even familiar with the lingo. So I’m going to maintain a little glossary:

  • Graphs
    panel

    A panel of graphs is just a horizontal strip of graphs, side by side. On the bottom of the strip is some variable, and each graph depicts something according to that variable. It represents the same dimensionality as a 3D graph where one of the dimensions (the one corresponding to the bottom of the strip) is discrete.

    trellis

    A trellis of graphs is a 2D table of graphs. This way you can have one variable moving across the columns, another vertically through the rows, and then at each cell you put a graph. In fact, it lets you have 4 dimensions: 2 continuous in each graph, and then 2 more discrete ones contained in the table.
  • Packages. I read somewhere that people don’t trust R packages anymore, since everyone writes them and they have very sparse documentation.
    RGL

    3d real-time visualization device using OpenGL. This package is natively 3d, which is nice, but it’s focused on interactive use rather than preparing graphs for
    publication.

    ggconf2

    A library that claims to be very simple to use. But most of the documentation, as far as I can tell, is only available in a book – which I don’t have. But the graphs and the website look very nice. The guy who is writing it is also working on another interesting project which is a literate programming system for R. I did a little literate programming using CWEB and I liked it and I want to pursue it.

Right now I ned to get all this data into a CSV file and then read that into R.

One consideration is whether to put all the run-code into the C++ file where the main algorithm is, or whether to make a gengetopt function so that I can make up bash scripts to do the runs.

  • C++
    • advantages:
      • little coding overhead,
      • no need to reinstantiate variables
    • disadv:
      • organization, it will get messy
  • ggo
    • adv: can write scripts easily to change things for runs
    • disadv:

So the main advantage of C++ is that there will be little overhead, but the disadvantage will be that it’s messier. Maybe I should separate the code into another file or something. It’s already in a namespace by itself. Maybe another namespace?

The main command to read in a CSV file was

read.csv("filename")

Then I’m going to try to make a trellis of graphs from a collection of CSV files of data.

Back from the SIAM Annual Conference in Pittsburgh. Looking over the R Journal, seems so many people make such nice plots using R. I want to learn more R. Read through much of Introduction to Statistics with R (ISwR) which is very well written. I wish he would write more.

I’ve since come across the Blue Book (The New S Language, by Becker Chambers and Wilks). I can’t believe nobody has written some of the basic things a person would want to do. For example, to access the basic datasets that the Blue Book references, you have to give the command

library(datasets)

But how could you ever know that if, like me, you know nothing about R? Now I’m giving up on finding the usa() function since I need to get my real data plotted soon and it’s got nothing to do with a plot of the USA.

(By the way I found I was above 2 loads for most of the afternoon and the machine was sluggish. Poked around a bit and found xulrunner being a monster. Figures. Now my load is happily falling. It’s always much nicer and faster when the load is, say, below .5 or so. And I just found out that a “load” is the number of processes that are in a runnable or uninterruptable state from the uptime manpage.)

Finally I’m getting some traction on R by reading the Blue Book! After all that, the Blue Book (which I’ve tried to do before, but without the necessity to get it to work as badly) is paying off. I’ve written a few trivial functions, I have a plot of the illiteracy versus murder rate for 50 states, with squares as the income level or something like that. I’d have never been able to do that before.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: