Thursday, August 25, 2011

Testing for seasonal unit roots in R

EDIT:This article has been re-written and updated in my analytics blog:Testing for seasonal unit roots in R

Suppose that for our new life insurance product, we want to model and forecast accidental deaths in US. Suppose that a our dataset is seasonal and that we intend to use a seasonal ARIMA model. We need to test our time to see if it is seasonal integrated. This will be the topic of this insurance quant blog post.

We will be using R and I will assume that the reader knows about R and how it could be applied in insurance. Briefly, R is very similar to MATLAB, SAS...etc. The website is http://www.r-project.org
I know that I have not written a "formal introduction" to R or how it can be used to model insurance, but that will have to wait because I deem it more important to document new packages/features of those packages as they come out.

Version 3 of the "forecast" R package was published yesterday. It has a new function for testing for seasonal unit roots. The function is nsdiffs().

R also come with a US Accidental Deaths dataset that we will be discussing in the insurance blog post with our example life insurancer problem. Right, so we are starting a life insurance business and we want to forecast accidental deaths.

So to follow along, open up R and type the following:

USAccDeaths

You will then see the US Accidental Deaths dataset. You can see that it is monthly.

Now install the "forecast" R package from CRAN. Then load it. By the time youd read this, forecast version (at least) 3.01 should be available. Version 3.00 would also be sufficient to work through this post, but I strongly recommend 3.01.

To view the help file for the nsdiffs() type:

?nsdiffs

It will bring up a page that is for both nsdiffs and ndiffs.

There are two tests that have been implemented in nsdiffs, the OCSB test (default) and the Canova-Hansen test. You can also speicify the seasonal period of your dataset. USAccDeaths is a TS object and the seasonal period or "frequency" is a data member of the USAccDeaths/TS object.

To perform the OCSB test:
nsdiffs(USAccDeaths)

To perform the Canova-Hansen test:
nsdiffs(USAccDeaths, test="ch")

The ouput: "1" means that there is a seasonal unit root and "0" that there is no seasonal unit root.

You notice that the two different tests give two different answers. This is because the Canova-Hansen test is less likely to decide in favour of a seasonal unit root than the OCSB test. This is becuase unlike the Canova-Hansen test, the OCSB test has a null hypothesis of a unit root. Further, Osborn (1990) writes that when in doubt, it's better to seasonally difference.

Enjoy this life insurance related post! :-)

Bibliography:
Osborn, DR (1990) "A survey of seasonality in UK macroeconomic variables", International Journal of Forecasting 6(3):327-336

Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377.

Canova F and Hansen BE (1995) "Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability", Journal of Business and Economic Statistics 13(3):237-252.

Monday, August 15, 2011

A model for Insurance Losses

Our first insurance model: a mathematical model for insurance losses.

Firstly, some notation:
E(X) = The expected value of X
Var(X) = The variance of X
StDev(X) = The standard deviation of X
SQRT(X) = The square root of X

Individual Insurance Losses

Let's use car insurance as an example because a lot of people have had first hand experience with it. Most Australians are forced to drive and hence reluctantly, experience auto insurance.

Suppose that there is a 70% percent chance that I will not make an insurance claim; presumably because I will not have a car accident. Then say there is a 20% chance that I will make an insurance claim for something small: $700, something mostly cosmetic and perhaps only involving my car.

Suppose that there is a 5% chance that I will make a $6000 insurance cliam.
A 3% chance that I will cause some appreciable damage to the tune of $18,000.
A 1.5% chance that I will write-off two moderately priced cars for a total of $60,000.
Finally, a 0.5% chance of causing a catastrophic accident with a $350,000 damage bill.

Suppose that the amount the insurance company must pay (the insurance loss) is a random variable, X. From the above, assuming that the insurance excess is zero, we can deduce a discrete distribution for X, f(x).



The discrete distribution for insurance payout is above. The probability of the payout is on the left and the amount of the payout is on the right.

So E(X) = .7*0+.2*700+.05*6000+.03*18000+.015*60000+.005*350000=$3,630

If there is an insurance excess, you can subtract the excess from the insurance payout.

Homework Exersises:
1. Find Var(X) and the standard deviation of X.
2. Suppose that there is an insurance excess of $500. What would the mean insurance payout and its standard deviation be then?


The distribution of insurance losses (insurance payouts) can also be continuous. However, that will be covered in a later post.

Collective insurance payouts
The random variable X is the insurance payout for one individual. Now the expected insurance payout for all individuals/customers in a given time period is the sum of their indiviual means. In this post, we will assumne that everyone has the same insurance payout distribution. We will denote the total/collective insurance payout as Y.

If we explicitly assume that all X's are independant:
E(Y) = the sum of the insurance payouts for each insurance customer.

Suppose that there are n insurance customers. Var(Y) = n*Var(X). (Remember that we are assuming that every insurance customer has the same distribution of X.)

Then StDev(Y) = StDev(X)*SQRT(n)

The standard deviation is a measure of risk. For a large n, StDev(Y) is much less than sum of StDev(X).

Bibliography:
Course notes for MTH3251/ETC3510/ETC5351 at Monash University by Fima Klebaner Semester 1, 2009.

Thursday, July 21, 2011

First insurance blog post!

HELLO! This is the first post and also a description of what this site will be about. This will be a site about insurance, probably focusing more on the quant side of insurance than other areas such as where to buy insurance; the internet should have many other places where you can search for "buy auto insurance" :-P

However, I do hope to also write about methods how you could get cheaper insurance and the mathematical reasoning behind it. For example, I have heard anecdotally that insurance for a Mercedes Benz is cheaper than unsurance for a Holden Commodore. Why? The anecdote says that "because there are a lot of kids who drive Commodores and do stupid stuff, whereas a Mercedes Benz is driven mostly by older people who are more careful". Would the insurance company really think like that? How about Pr(Crash | Commodore) > Pr(Crash | Mercedes Benz )? Now to make this fit a little better with the insurance model that will be published in the next post: (mean payout for a Commodore) > (mean payout for a Mercedes Benz). However, whether or not this insurance contract princing anecdote is true or not can only be verified with actual data.

Have fun,

Insurance Blog