Posts tagged with linear

## Rank-Nullity Theorem

The rank-nullity theorem in linear algebra says that dimensions either get

• thrown in the trash
• or show up

after the mapping.

By “the trash” I mean the origin—that black hole of linear algebra, the /dev/null, the ultimate crisscross paper shredder, the ashpile, the wormhole to void and cancelled oblivion; that country from whose bourn no traveller ever returns.

The way I think about rank-nullity is this. I start out with all my dimensions lined up—separated, independent, not touching each other, not mixing with each other. ||||||||||||| like columns in an Excel table. I can think of the dimensions as separable, countable entities like this whenever it’s possible to rejigger the basis to make the dimensions linearly independent.

I prefer to always think about the linear stuff in its preferably jiggered state and treat how to do that as a separate issue.

So you’ve got your 172 row × 81 column matrix mapping 172→ separate dimensions into →81 dimensions. I’ll also forget about the fact that some of the resultant →81 dimensions might end up as linear combinations of the input dimensions. Just pretend that each input dimension is getting its own linear λ stretch. Now linear just means multiplication.

Linear stretches λ affect the entire dimension the same. They turn a list like [1 2 3 4 5] into [3 6 9 12 15] (λ=3). It couldn’t be into [10 20 30 − 42856712 50] (λ=10 except not everywhere the same stretch=multiplication).

Also remember – everything has to stay centred on 0. (That’s why you always know there will be a zero subspace.) This is linear, not affine. Things stay in place and basically just stretch (or rotate).

So if my entire 18th input dimension [… −2 −1 0 1 2 3 4 5 …] has to get transformed the same, to [… −2λ −λ 0 λ 2λ 3λ 4λ 5λ …], then linearity has simplified this large thing full of possibility and data, into something so simple I can basically treat it as a stick |.

If that’s the case—if I can’t put dimensions together but just have to λ stretch them or nothing, and if what happens to an element of the dimension happens to everybody in that dimension exactly equal—then of course I can’t stick all the 172→ input dimensions into the →81 dimension output space. 172−81 of them have to go in the trash. (effectively, λ=0 on those inputs)

So then the rank-nullity theorem, at least in the linear context, has turned the huge concept of dimension (try to picture 11-D space again would you mind?) into something as simple as counting to 11 |||||||||||.

Blairthatcher

by and © Aude Oliva & Philippe G. Schyns

A hybrid face presenting Margaret Thatcher (in low spatial frequency) and Tony Blair (in high spatial frequency)

[I]f you … defocus while looking at the pictures, Margaret Thatcher should substitute for Tony Blair ( if this … does not work, step back … until your percepts change).

(Source: cvcl.mit.edu)

[Karol] Borsuk’s geometric shape theory works well because … any compact metric space can be embedded into the “Hilbert cube” [0,1] × [0,½] × [0,⅓] × [0,¼] × [0,⅕] × [0,⅙] ×  …

A compact metric space is thus an intersection of polyhedral subspaces of n-dimensional cubes …

We relate a category of models A to a category of more realistic objects B which the models approximate. For example polyhedra can approximate smooth shapes in the infinite limit…. In Borsuk’s geometric shape theory, A is the homotopy category of finite polyhedra, and B is the homotopy category of compact metric spaces.

—-Jean-Marc Cordier and Timothy Porter, Shape Theory

(I rearranged their words liberally but the substance is theirs.)

in R do: prod( factorial( 1/ 1:10e4) ) to see the volume of Hilbert’s cube → 0.

The Nervous System

• dissections of live criminals’ brains
• animal spirits (psychic)
• neuron νεῦρον is Greek for cord
• Galen thought the body was networked together by three systems: arteries, veins, and nerves
• Descartes as the source of the theory of reflexive responses—fire stings hand, νευρώνες tugging on the brain, fluids in the brain tug on some other νευρώνες, and the hand pulls away—automatically.
• the analogy of a clock (…today we’re much smarter. We think of brains as being like computers, which is definitely not an outgrowth of today’s hot technology!)
• cogito ergo sumsensation is what’s distinctive about our brains. How could a clock feel something? (Today again, we’re much smarter: we think it’s the ability to reflect on thought—anything with at least one “meta” term in it must be intelligent.)
• Muscles fire like bombs exploding (a chemical reaction of two mutually combustible elements)—and the fellow who came up with this theory had been spending a lot of time in the battlefield where bombs were the new technology.
• autonomic, peripheral, central nervous systems
• Willis, Harvey, Newton
• What makes nerves transmit information so fast?
• Galvani’s theory that electricity is only an organic phenomenon. (Hucksters arise!)
• The theory of the synapse—it’s the connections that matter.
• The discovery that nerves aren’t continuous connected strings, but rather made up of billions of individual parts.
• Activation thresholds—a classic and simple non-linear function!

(Source: BBC)

from C. H. Edwards, Jr.
come some examples of linear (vector) spaces less familiar than span{(1,0,0,0,0), (0,1,0,0,0), ..., (0,0,0,0,1)}.

• The infinite set of functions {1, cos θ, sin θ, ..., cos n • θ, sin n • θ, ...} is orthogonal in 𝓒[−π,+π]. This is the basis for Fourier series.
• Let 𝓟 denote the vector space of polynomials, with inner product (multiplication) of p,q ∈ 𝓟 given by ∫_1¹ p(x) • q(x) • dx. Applying Gram-Schmidt orthogonalisation gets us within constant factors of the Legendre polynomials 1, x, x²−⅓, x³−⅗x, x⁴−6/7 x²+9/5, ...
• (and, from M A Al-Gwaiz)
The set of all infinitely-smooth complex-valued functions that map to zero outside a finite interval (i.e., have compact support). These tempered distributions lead to generalised distributions (hyperfunctions) and imprecision on purpose.

Here’s a physically intuitive reason that rotations ↺

(which seem circular) are in fact linear maps.

If you have two independent wheels that can only roll straight forward and straight back, it is possible to turn the luggage. By doing both linear maps at once (which is what a matrix
$\large \dpi{300} \bg_white \begin{pmatrix} a \rightsquigarrow a & | & a \rightsquigarrow b & | & a \rightsquigarrow c \\ \hline b \rightsquigarrow a & | & b \rightsquigarrow b & | & b \rightsquigarrow c \\ \hline c \rightsquigarrow a & | & c \rightsquigarrow b & | & c \rightsquigarrow c \end{pmatrix}$

or Lie action does) and opposite each other, two straights ↓↑ make a twist ↺.

Or if you could get a car | luggage | segway with split (= independent = disconnected) axles

to roll the right wheel(s) independently and opposite to the left wheel(s)

, then you would spin around in place.

A piecewise linear function in two dimensions (top) and the convex polytopes on which it is linear (bottom).

(Source: Wikipedia)

hi-res

hi-res

## Regressions 101: “Significance”

###### SETUP (CAN BE SKIPPED)

We start with data (how was it collected?) and the hope that we can compare them. We also start with a question which is of the form:

• how much tax increase is associated with how much tax avoidance/tax evasion/country fleeing by the top 1%?
• how much traffic does our website lose (gain) if we slow down (speed up) the load time?
• how many of their soldiers do we kill for every soldier we lose?
• how much do gun deaths [suicide | gang violence | rampaging multihomicide] decrease with 10,000 guns taken out of the population?
• how much more fuel do you need to fly your commercial jet 1,000 metres higher in the sky?
• how much famine [to whom] results when the price of low-protein wheat rises by $1? • how much vegetarian eating results when the price of beef rises by$5? (and again distributionally, does it change preferentially by people with a certain culture or personal history, such as they’ve learned vegetarian meals before or they grew up not affording meat?) How much does the price of beef rise when the price of feed-corn rises by $1? • how much extra effort at work will result in how much higher bonus? • how many more hours of training will result in how much faster marathon time (or in how much better heart health)? • how much does society lose when a scientist moves to the financial sector? • how much does having a modern financial system raise GDP growth? (here ∵ the X ~ branchy and multidimensional, we won’t be able to interpolate in Tufte’s preferred sense) • how many petatonnes of carbon per year does it take to raise the global temperature by how much? • how much does$1000 million spent funding basic science research yield us in 30 years?
• how much will this MBA raise my annual income?
• how much more money does a comparable White make than a comparable Black? (or a comparable Man than a comparable Woman?)
• how much does a reduction in child mortality decrease fecundity? (if it actually does)

• how much can I influence your behaviour by priming you prior to this psychological experiment?
• how much higher/lower do Boys score than Girls on some assessment? (the answer is usually “low |β|, with low p" — in other words "not very different but due to the high volume of data whatever we find is with high statistical strength")

bearing in mind that this response-magnitude may differ under varying circumstances. (Raising morning-beauty-prep time from 1 minute to 10 minutes will do more than raising 110 minutes to 120 minutes of prep. Also there may be interaction terms like you need both a petroleum engineering degree and to live in one of {Naija, Indonesia, Alaska, Kazakhstan, Saudi Arabia, Oman, Qatar} in order to see the income bump. Also many of these questions have a time-factor, like the MBA and the climate ones.)

As Trygve Haavelmo put it: using reason alone we can probably figure out which direction each of these responses will go. But knowing just that raising the tax rate will drive away some number of rich doesn’t push the debate very far—if all you lose is a handful of symbolic Eduardo Saverins who were already on the cusp of fleeing the country, then bringing up the Laffer curve is chaff. But if the number turns out to be large then it’s really worth discussing.

In less polite terms: until we quantify what we’re debating about, you can spit bollocks all day long. Once the debate is quantified then the discussion should become way more intelligent, less derailing to irrelevant theoretically-possible-issues-which-are-not-really-worth-wasting-time-on.

So we change one variable over which we have control and measure how the interesting thing responds. Once we measure both we come to the regression stage where we try to make a statement of the form “A 30% increase in effort will result in a 10% increase in wage” or “5 extra minutes getting ready in the morning will make me look 5% better”. (You should agree from those examples that the same number won’t necessarily hold throughout the whole range. Like if I spend three hours getting ready the returns will have diminished from the returns on the first five minutes.)

Avoiding causal language, we say that a 10% increase in (your salary) is associated with a 30% increase in (your effort).



The two numbers that jump out of any regression table output (e.g., lm in R) are p and β.

• β is the estimated size of the linear effect
• p is how sure we are that the estimated size is exactly β. (As in golf, a low p is better: more confident, more sure. Low p can also be stated as a high t.)

Wary that regression tables spit out many, many numbers (like Durbin-Watson statistic, F statistic, Akaike Information, and more) specifically to measure potential problems with interpreting β and p naïvely, here are pictures of the textbook situations where p and β can be interpreted in the straightforward way:

First, the standard cases where the regression analysis works as it should and how to read it is fairly obvious:
(NB: These are continuous variables rather than on/off switches or ordered categories. So instead of “Followed the weight-loss regimen” or “Didn’t follow the weight-loss regimen” it’s someone quantified how much it was followed. Again, actual measurements (how they were coded) getting in the way of our gleeful playing with numbers.)

Second, the case I want to draw attention to: a small statistical significance doesn’t necessarily mean nothing’s going on there.

The code I used to generate these fake-data and plots.

If the regression measures a high β but low confidence (high p), that is still worth taking a look at. If regression picks up wide dispersion in male-versus-female wages—let’s say double—but we’re not so confident (high p) that it’s exactly double because it’s sometimes 95%, sometimes 180%, sometimes 310%, we’ve still picked up a significant effect.

The exact value of β would not be statistically significant or confidently precise due to a high p but actually this would be a very significant finding. (Try it the same with any of my other examples, or another quantitative-comparison scenario you think up. It’s either a serious opportunity, or a serious problem, that you’ve uncovered. Just needs further looking to see where the variation around double comes from.)

You can read elsewhere about how awful it is that p<.05 is the password for publishable science, for many reasons that require some statistical vocabulary. But I think the most intuitive problem is the one I just stated. If your geiger counter flips out to ten times the deadly level of radiation, it doesn’t matter if it sometimes reads 8, sometimes 0, and sometimes 15—the point is, you need to be worried and get the h*** out of there. (Unless the machine is wacked—but you’d still be spooked, wouldn’t you?)


###### FOLLOW-UP (CAN BE SKIPPED)

The scale of β is the all-important thing that we are after. Small differences in βs of variables that are important to your life can make a huge difference.

• Think about getting a 3% raise (1.03) versus a 1% wage cut (.99).
• Think about twelve in every 1000 births kill the mother versus four in every 1000.
• Think about being 5 minutes late for the meeting versus 5 minutes early.

Order-of-magnitude differences (like 20 versus 2) is the difference between fly and dog; between life in the USA and near-famine; between oil tanker and gas pump; between Tibet’s altitude and Illinois’; between driving and walking; even the Black Death was only a tenth of an order of magnitude of reduction in human population.

Keeping in mind that calculus tells us that nonlinear functions can be approximated in a local region by linear functions (unless the nonlinear function jumps), β is an acceptable measure of “Around the current levels of webspeed” or “Around the current levels of taxation” how does the interesting thing respond.

Linear response magnitudes can also be used to estimate global responses in a nonlinear function, but you will be quantifying something other than the local linear approximation.