I argued that
CVaR (expected shortfall) of personal income is a better indicator of a society’s success than is GDP.
CVaR combines the basic statistical operations of
- subsetting and
In statistical analysis of the middle it’s useful to winsorise—trim off the upper and lower
X% and look at those separately. With
CVaR it’s almost the opposite: look at the upper or lower edge only. (Although you could also look at only the bottom 50% which is not really an edge.)
You could also use the same technique to look at the “top” rather than the “bottom”. Think about, for example, the apparent puzzle of
- rising life expectancy, with
- stagnant longevity.
Average lifespans rise as early causes of death (dysentery, childbirth, violence) decline.
But death by “natural causes” (getting old and all your body systems start to fail | telomere cutoffs | whatever “natural causes” means; it’s sort of vague) doesn’t get postponed by as much.
I can think of three ways to even go about defining what arithmetic we’re going to perform on the data to answer “Is longevity higher or lower?”.
- Perform a lot of subset operations on cause-of-death. Remove the violent ones, the childbirth ones, the cancer ones that also coincide with old age but not middle-age but maybe middle-old-age should count…, the narcotic ones (but not narcotics that are used for euthanasia in old people), the driving accidents, the young suicides, the wild animal attacks, the malaria, the starvation, the tuberculosis, the ebola, ….
- Perform just one subset operation on age. Pick some age like 70 over which you will consider all deaths to be “of old age”, even if they got hit by a car. Average all those ages together and the number you’re using now cuts out—roughly, not surgically—a lot of the deaths you aren’t interested in.
Just like subsetting to age at death
WHERE age > 5will pull out childhood illness deaths.
- Consider the upper 10% or 20% or 50% of ages at death. Average that together and now the number you’re comparing reasonable numbers across countries.
This last one is the
CVaR approach. Clearly all three have flaws. But the third one needs the least data and the least data janitorship (imagine languages or different fields/columns or different coding choices).
Just like using lower
CVaR to compare only poor people’s incomes, if we used upper
CVaR to compare only old people’s death ages, we’d get better numbers and talk more sense with only a bit more effort.