Posts tagged with cvar

I argued that `CVaR` (expected shortfall) of personal income is a better indicator of a society’s success than is GDP.

$\dpi{200} \bg_white \large \mathtt{\ CVaR} \overset{\mathrm{def}}= \int_{\mathtt{lo}}^{\mathtt{hi}}\mathtt{value} \cdot \mathtt{probability}$

`CVaR` combines the basic statistical operations of

• subsetting and
• averaging.

In statistical analysis of the middle it’s useful to winsorise—trim off the upper and lower `X%` and look at those separately. With `CVaR` it’s almost the opposite: look at the upper or lower edge only. (Although you could also look at only the bottom 50% which is not really an edge.)

You could also use the same technique to look at the “top” rather than the “bottom”. Think about, for example, the apparent puzzle of

• rising life expectancy, with
• stagnant longevity.

Average lifespans rise as early causes of death (dysentery, childbirth, violence) decline.

But death by “natural causes” (getting old and all your body systems start to fail | telomere cutoffs | whatever “natural causes” means; it’s sort of vague) doesn’t get postponed by as much.

I can think of three ways to even go about defining what arithmetic we’re going to perform on the data to answer “Is longevity higher or lower?”.

1. Perform a lot of subset operations on cause-of-death. Remove the violent ones, the childbirth ones, the cancer ones that also coincide with old age but not middle-age but maybe middle-old-age should count…, the narcotic ones (but not narcotics that are used for euthanasia in old people), the driving accidents, the young suicides, the wild animal attacks, the malaria, the starvation, the tuberculosis, the ebola, ….
2. Perform just one subset operation on age. Pick some age like 70 over which you will consider all deaths to be “of old age”, even if they got hit by a car. Average all those ages together and the number you’re using now cuts out—roughly, not surgically—a lot of the deaths you aren’t interested in.

Just like subsetting to age at death `WHERE age > 5` will pull out childhood illness deaths.
3. Consider the upper 10% or 20% or 50% of ages at death. Average that together and now the number you’re comparing reasonable numbers across countries.

This last one is the `CVaR` approach. Clearly all three have flaws. But the third one needs the least data and the least data janitorship (imagine languages or different fields/columns or different coding choices).

Just like using lower `CVaR` to compare only poor people’s incomes, if we used upper `CVaR` to compare only old people’s death ages, we’d get better numbers and talk more sense with only a bit more effort.

The original paper defining Conditional Value at Risk = CVaR = Expected Tail Loss.

### Pessimism & Probability Distributions

What’s “the best” statistic? If you couldn’t see an entire probability distribution, but you could ask one question of it and get one number out, what question would you ask?

In an interview given to EDGE magazine, Bart Kosko explains how great the `median` is. He used to think the `mean` was the statistic to look at (cf., Francis Galton’s story of the crowd average guessing correctly the weight of the prize hog to a tenth of a unit) but the `median` is more robust and so on.

My opinion is that the most important statistic for many practical purposes is something like the `25% CVaR` or `50% CVaR`. I think that’s the essence of “What do you stand to lose?” as people mean it in normal English.

In other words, I think the way people think about risk in everyday, non-finance terms, basically boils down to

• the observed `minimum` (theoretically possible `minimum` if you’re a lawyer) and
• the `CVaR` (expected loss) for some wide-ish (likely) swath of the bad outcomes.

The reason the `CVaR` is so intuitive is that it smoothly interweaves both

• egregiously bad, low probability outcomes (“You could die with a .01% probability!” is actually a good reason to avoid something)
• and likely bad outcomes (“After you graduate you might not find a job in your field”).
• Compare to: “Standard & Poor’s rated structured credit products based solely on the probability that they would pay off less than 100% of their principal plus interest, and not at all based on the expected loss if that happened.”

So obviously there isn’t “one best” question to ask. It depends what you want to know—if it’s the value of the gravitational constant, `median` may be a great statistic. On the other hand, if you’re looking at the salaries that might result from  your law degree or MBA—that is, if you’re looking for a sensible measure of risk and downside—then I’d suggest a `CVaR`.

(I actually emailed Dr Kosko and got a response—but he linked to a 20-page paper he had written and I never got around to reading it and felt bad responding without reading all of his response.)

Gauging the frothiness of the webby/techy/san-fran VC market.

Source: Mark Suster. Propagated via one of tumblr’s owners, who added:

Based on the NVCA statistics on the venture capital industry, there are [approximately] 1,000 early stage financings every year….

And somewhere around 50 - 100 of them exit for more than \$100mm every year. So 5-10% of the companies financed by VCs end up exiting for more than \$100mm.

Mathematical PS: These are value-at-risk numbers, just upside-down.

hi-res