Posted by: nealc | July 7, 2011

function objects

trying to write some c++ code and reading code complete, excellent and comprehensive. the example adt’s are all concrete, and im doing numerical simulation, in a simulation i guess the highest level of abstraction is simulated paths, so that’s an adt. he also says to use adt’s for everything, maybe, but i think what i learnt from scripting languages is not to overdo it.

also oliviera says that an important object in numerics is the function object or functor, and gives the trapezoidal integration rule. interesting. in math, i think a function is an adt actually, and there’s a base adt and then other functions is-a function as adts.

Posted by: nealc | July 22, 2010

More progress on the Blue Book

Doing a little more graphing in Chapter 4 of the Blue Book. Very nice, but again the problem with missing data. I found


which gives some of the examples but not all.

Doing Chapter 4. Very nice stuff. I put in the theoretical.plot() function and made a quantile plot against a t-distribution. This is the book I’ve needed for a long time.

Finished Chapter 4. I’m happy about that – this is definitely the book I’ve needed to learn R. Now I just skip entering code that references missing datasets, what else can I do. I do wish I could follow along since whenever an example works, R is so nice in the way you can tweak it and learn more. They mention it on the R webpage but somehow I didn’t “get” the idea that basically only this book will teach you what you need to know.

Chapter 5 is looking perhaps a little bit too detailed, and I had to get to Chapter 10 two weeks ago. But Chapter 5 looks pretty fundamental and useful. I’ve learned so much lately it’s hard to write down everything. I guess one thing that baffled me for a long time was what data frames versus lists versus matrices versus timeseries are. Since they just print out to the screen I didn’t realize they are objects with some kind of default print method.

Posted by: nealc | July 21, 2010

Learning to Plot in R

R has been something I’ve been trying to learn for atleast a year now. It’s not easy. So here I’m going to keep some notes on what and how I’ve learnt so far.

I’m not even familiar with the lingo. So I’m going to maintain a little glossary:

  • Graphs

    A panel of graphs is just a horizontal strip of graphs, side by side. On the bottom of the strip is some variable, and each graph depicts something according to that variable. It represents the same dimensionality as a 3D graph where one of the dimensions (the one corresponding to the bottom of the strip) is discrete.


    A trellis of graphs is a 2D table of graphs. This way you can have one variable moving across the columns, another vertically through the rows, and then at each cell you put a graph. In fact, it lets you have 4 dimensions: 2 continuous in each graph, and then 2 more discrete ones contained in the table.
  • Packages. I read somewhere that people don’t trust R packages anymore, since everyone writes them and they have very sparse documentation.

    3d real-time visualization device using OpenGL. This package is natively 3d, which is nice, but it’s focused on interactive use rather than preparing graphs for


    A library that claims to be very simple to use. But most of the documentation, as far as I can tell, is only available in a book – which I don’t have. But the graphs and the website look very nice. The guy who is writing it is also working on another interesting project which is a literate programming system for R. I did a little literate programming using CWEB and I liked it and I want to pursue it.

Right now I ned to get all this data into a CSV file and then read that into R.

One consideration is whether to put all the run-code into the C++ file where the main algorithm is, or whether to make a gengetopt function so that I can make up bash scripts to do the runs.

  • C++
    • advantages:
      • little coding overhead,
      • no need to reinstantiate variables
    • disadv:
      • organization, it will get messy
  • ggo
    • adv: can write scripts easily to change things for runs
    • disadv:

So the main advantage of C++ is that there will be little overhead, but the disadvantage will be that it’s messier. Maybe I should separate the code into another file or something. It’s already in a namespace by itself. Maybe another namespace?

The main command to read in a CSV file was


Then I’m going to try to make a trellis of graphs from a collection of CSV files of data.

Back from the SIAM Annual Conference in Pittsburgh. Looking over the R Journal, seems so many people make such nice plots using R. I want to learn more R. Read through much of Introduction to Statistics with R (ISwR) which is very well written. I wish he would write more.

I’ve since come across the Blue Book (The New S Language, by Becker Chambers and Wilks). I can’t believe nobody has written some of the basic things a person would want to do. For example, to access the basic datasets that the Blue Book references, you have to give the command


But how could you ever know that if, like me, you know nothing about R? Now I’m giving up on finding the usa() function since I need to get my real data plotted soon and it’s got nothing to do with a plot of the USA.

(By the way I found I was above 2 loads for most of the afternoon and the machine was sluggish. Poked around a bit and found xulrunner being a monster. Figures. Now my load is happily falling. It’s always much nicer and faster when the load is, say, below .5 or so. And I just found out that a “load” is the number of processes that are in a runnable or uninterruptable state from the uptime manpage.)

Finally I’m getting some traction on R by reading the Blue Book! After all that, the Blue Book (which I’ve tried to do before, but without the necessity to get it to work as badly) is paying off. I’ve written a few trivial functions, I have a plot of the illiteracy versus murder rate for 50 states, with squares as the income level or something like that. I’d have never been able to do that before.

Posted by: nealc | November 22, 2009

Madan’s hidden put NYQFS

Attended Dilip Madan’s Blackrock NYQFS talk. Again a good, thought-provoking talk. The audience was smaller than what I saw for Heston’s talk, but this talk was much more interesting to me. I think it was the words ‘capital requirements’ which perhaps scared off any self-styled cowboys.

There were several ideas which were new to me in this talk. First he said there is a hidden put option in the limited-liability structure. This is a new way of looking at corporation/management agency-type problems which had been studied since Adam Smith, Berle and Means, and others, but I’ve never seen the further links to Black’s equity-as-call-option and public bailout before. This has always been a grey area, the ability of a limited liability company to fail and put all its debt and risk onto the government. Presumably it’s a pillar of capitalism, even Adam Smith mentions this when he discusses projects which only the government could profitably take on like bridges, but it’s still quite poorly understood, very much a blind spot. What is noteworthy here is that Madan was able to infer several interesting dollar figures on real financial firms from his new framework using VG processes.

The basic idea is that a limited liability firm owns a hidden put option. If things go bad, the management can declare bankruptcy. The bondholders and stockholders can wrangle over whatever’s left of the company, but after they’ve taken whatever is left, then the rest, all negative, goes on to the taxpayer. The management don’t suffer loss — they carefully manage their put option. In particular they’ve been doing this ever since the shell companies and holding companies of the Great Crash and before, each company a limited-liability firebreak to the next — a free put option lying on the sidewalk as Madan might say.

Exactly what goes to the taxpayer though? This raised several questions from the audience. What happens, for instance, is that a company could receive a limited liability. It could then, for instance, short a stock. For a while the CEO could receive a healthy bonus if the stock does poorly. The stock could then quickly skyrocket, and the CEO could then declare the company bankrupt. The bank which loaned the stock to the CEO would then not get the stock back and suffer a loss, which may cause it to collapse if there are correlated defaults, which could then require a bailout and hence a put.

It’s basically a bailout put. But Madan was able to tease some very interesting figures out the option volatility surface with his VG processes; but I’m not sure the volatility surface has so much special information as to be able to determine the value of the bailout put of a bank.

Madan was able to find capital requirements based on the volatility surface, but again if a bank met the revised capital requirements then the volatility surface would change; in particular, might a bank be able to manipulate its volatility surface in the option market and circumvent the capital requirements? (If we accept the possibility of bear runs and manipulation in stocks, then why would we rule out manipulation of options?) Someone else pointed out that the volatility surface during a crash is very unstable; Madan replied that one could use the surface in stable periods.

This is, in itself, an interesting position — that the stable market contains within it a preconception of the crash — but doesn’t it, in the surface?

Truly one of the most interesting talks I’ve attended in a long while.

Posted by: nealc | November 8, 2009

Jobs found and lost

this article has a graph at the end which i’ve always wanted to see. it shows, over the course of several years, # of jobs created, # of firings, and the net (in black between the two). it’s actually somewhat reassuring that there’s so many jobs, and that the black line is so small relative to the # of new jobs. but if you integrate the black area, it does ramp up quickly.

another interesting aspect of this graph is that it seems to not be subject to what i consider a flaw in the regular bls statistics. the bls only counts people who’ve been jobless for up to 6 months, so the true number of jobless is much higher than the official figure, especially in a recession. however this article only considers # of hires versus # of fires, and so whether someone’s been jobless for 6 months etc doesn’t enter the picture. the total black area, as far as i can tell, should equal the total number of unemployed over the course of the recession.

Posted by: nealc | November 6, 2009

Running standard deviation

Found this once and then forgot it, so I just found it again.  I’m going to try it out, it would be nice to have some sense of accuracy in option prices coming out of MC. It’s an article on a technique which it claims to be from Knuth ultimately. I don’t get it right away, as it says it’s not obvious that it gives the right answer. but it’s supposed to be accurate and tolerate roundoff etc. it seems to be adding numbers of different magnitudes just as any other technique, which can cause roundoff, but it does seem to only subtract numbers of similar magnitudes.

maybe a good use for a class/object, since it keeps some running variables, which i also have yet to learn in python.

ok its done. python classes are pretty neat. the __str__(self) is nice. but it’s awfully clunky to have to write self.n etc all the time, and i think its inconsistent considering how much automatic scoping python does already. im always worried about aliasing problems in python since there’s automatic references in places where i don’t expect them and this inconsistency isn’t reassuring. but for this particular application the class paradigm is remarkably convenient. and now that i have stdev, i also put in 95% CI and now i have some new useful data.

and another place where i got burned.  if you look in my code below, there’s a spot where i have self.n+0. in the denominator.  without the +0. it was treating it as an int.  i hope one day we get a language which automatically promotes to float when this happens.

what is a python module? it’s just a file with python code in it. but whenever i change my file that contains the running stdev code, and run the calling module, ipython doesn’t reload investigating that led me to ipython’s %pdb on magic function, which is very neat, it automatically launches pdb which i’ve never used before, but which i’ve always wanted to use.

from math import *

class sigmarunner:
    def __init__(self):
        self.n = 0
        self.mean = 0.
        self.s = 0.
        self.mold = 0.
        self.sold = 0.

    def push(self, x):
        self.n += 1

        if self.n == 1:
            self.mold = x
            self.mean = x
            self.mean = self.mold + (x-self.mold)/(self.n+0.)
            self.s = self.sold + (x-self.mold)*(x-self.mean)

            self.mold = self.mean
            self.sold = self.s

    def variance(self):
        return self.n > 1 and (self.s/(self.n-1.)) or 0.

    def stdev(self):
        return sqrt(self.variance())

    def marginerror95(self):
        return 1.96*self.stdev()/sqrt(self.n)
    def conflo95(self):
        return self.mean-self.marginerror95()
    def confhi95(self):
        return self.mean+self.marginerror95()

    def __str__(self):
        return "n=%d mean=%g stdev=%g E=%g" % (self.n, self.mean, self.stdev(), self.marginerror95())

    #def __repr__(self):
        #return "n=%d mean=%g stdev=%g" % (self.n, self.mean, self.stdev())

if __name__=='__main__':
    s = sigmarunner()

    print s
    print s
    print s
    print s
    print s
    print s
    print s
Posted by: nealc | November 2, 2009

Heston at BlackRock

Really enjoyed attending Steve Heston’s NYFS talk.  I had never been to BlackRock’s facilities so that was pretty neat also.   I have a BlackRock guest badge now!

So I’m not quite sure what to make of his talk at the moment.  He kept drilling home the idea that you could possibly pick up 2-3bps if you trade a stock at the right half-hour of the day as opposed to the wrong half-hour, and that algos, which are responsible for 1/2 of trading, haven’t figured this out yet.  He seems to suggest that it has to do with traders’ daily schedules, they trade when then come in, then they go to meetings, then they trade again before closing. Did alot of autocorrelation stuff which I’d have thought would’ve been studied long ago.  A professor type at the end asked him about the effect of changing the portfolio throughout the autocorrelations, which he didn’t mention, and which would seem to be problematic.  He did alot of decile-spread thing, which seemed to be a changing portfolio.  So taking an autocorrelation on something that’s changing would seem to be introducing something that could confound the results.  Why not do the autocorrelation study on the S&P 500?  It might not show up then.

Heston’s a rock-star nonetheless.

The crowd was also pretty interesting.  Seemed like there were lots of medium to well-known professors, then their Ph.D. students, and then some young banker types.

Posted by: nealc | October 20, 2009

Blink on job interviews

gladwell’s article is one of the most interesting articles ive seen about interviews.  they filmed the first 10 seconds of a bunch of interviews and were able to predict whether the person got the job… also they talk about the `structured interview’ which i thought was interesting.  whenever i had puzzle-type job interviews (why are manhole covers round etc) i always thought they were silly, but im not sure whether the structured interview is an artifact of the industrial psychologists’ quasi-scientific discipline approach, and im not sure i believe that it would give you that much more information about a person’s job performance than the first 10 seconds anyway.  but i do believe in the first 10 seconds theory and that people would reinforce their opinions of a candidate through the course of an interview.

i was reminded of gladwell’s article when i was looking in technorati for a category for freefund and came across this blogger’s experience at google and then chased some further links around.