Posts tagged with matrices

Here’s a physically intuitive reason that rotations ↺

(which seem circular) are in fact linear maps.

If you have two independent wheels that can only roll straight forward and straight back, it is possible to turn the luggage. By doing both linear maps at once (which is what a matrix
$\large \dpi{300} \bg_white \begin{pmatrix} a \rightsquigarrow a & | & a \rightsquigarrow b & | & a \rightsquigarrow c \\ \hline b \rightsquigarrow a & | & b \rightsquigarrow b & | & b \rightsquigarrow c \\ \hline c \rightsquigarrow a & | & c \rightsquigarrow b & | & c \rightsquigarrow c \end{pmatrix}$

or Lie action does) and opposite each other, two straights ↓↑ make a twist ↺.

Or if you could get a car | luggage | segway with split (= independent = disconnected) axles

to roll the right wheel(s) independently and opposite to the left wheel(s)

, then you would spin around in place.

## ∄ inverse

• I cheated on you. ∄ way to restore the original pure trust of our early relationship.
• The broken glass. Even with glue we couldn’t put it back to be the same original glass.
• I got old. ∄ potion to restore my lost youth.
• Adam & Eve ate from the tree of the knowledge of good & evil. They could not unlearn what they learned.
• “Be … careful what you put in that head because you will never, ever get it out.” ― Thomas Cardinal Wolsey
• We polluted the lake with our sewage runoff. The algal blooms choked off the fish. ∄ way to restore it.
• Phase change. And the phase boundary can only be traversed one direction (or the backwards direction costs vastly more energy). The marble rolls off the table, the leg poisoned by gangrene. The father dies at war. The unkind words can’t be unsaid.

#semigroups

## High-dimensional Arrays in J

J is hott. Some highlights from the Wikipedia article and J's homepage:

• you can do a lot with just a few characters in J. Define a moving average in 8 characters, including spaces, for example.
• Have you ever felt like whether it’s Java or C, Python or Ruby, all these languages are just the Same Old Thing?

J makes thinking in high-dimensional arrays easy.

1. The sentence .i 7 8 means “Show me a 7×8 two-array” (ok, “matrix” but … matrices are verbs and arrays are nouns)
2. The sentence .i 7 8 3 means “Show me a 7×8×3 three-array”.
3. The sentence .i 7 8 3 4 13 2 66 means "Show me a 7×8×3×4×13×2×66 dimensional seven-array”.

I won’t reprint the long outputs but here’s a shorter one.

   i.4 5 3
0  1  2
3  4  5
6  7  8
9 10 11
12 13 14

15 16 17
18 19 20
21 22 23
24 25 26
27 28 29

30 31 32
33 34 35
36 37 38
39 40 41
42 43 44

45 46 47
48 49 50
51 52 53
54 55 56
57 58 59


And another for clarity:

   i.3 5 4
0  1  2  3
4  5  6  7
8  9 10 11
12 13 14 15
16 17 18 19

20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
36 37 38 39

40 41 42 43
44 45 46 47
48 49 50 51
52 53 54 55
56 57 58 59

This is reminiscent of using R's combn function to visualise higher-dimensional stuff, right?

I guess this is how computers think all the time! I wonder what they say about us when we’re not around.

## Robot Committing Suicide

I move my arm. [holonomy]

I move my arm, hand and shoulder. [holonomy]

I move my arm, hand, shoulders, but my fingers are still.

I move my arm. [lie group]

I move my arm across the table. [embedded in a space]

I brush my laptop computer to the side. [lie group A]

I brush my laptop computer to the other side. [lie group A⁻¹]

I brush my laptop computer off the table. []

The laptop computer falls to the floor. [gravity = m/M ¹⁄dist²]

The laptop computer splinters and cracks. [nonlinear PDE’s]

The laptop computer can not be repaired. [entropy]

I brush my arm back and forth across the table. [AA⁻¹AA⁻¹]

The laptop computer, in pieces, lies still on the floor. [principle of least action]

I brush my arm across the table. There is nothing else on the table. [noncommutative]

I brush my arm across the table. No objects are moved. [phase change]

## How do I Create the Identity Matrix in R? Also a bit of group theory.

I googled for this once upon a time and nothing came up. Hopefully this saves someone ten minutes of digging about in the documentation.

You make identity matrices with the keyword diag, and the number of dimensions in parentheses.

> diag(3)
[,1] [,2] [,3]
[1,]    1 0 0
[2,]    0 1 0
[3,]    0 0 1 

That’s it.

> diag(11)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    1 0 0 0 0 0 0 0 0 0 0
[2,]    0 1 0 0 0 0 0 0 0 0 0
[3,]    0 0 1 0 0 0 0 0 0 0 0
[4,]    0 0 0 1 0 0 0 0 0 0 0
[5,]    0 0 0 0 1 0 0 0 0 0 0
[6,]    0 0 0 0 0 1 0 0 0 0 0
[7,]    0 0 0 0 0 0 1 0 0 0 0
[8,]    0 0 0 0 0 0 0 1 0 0 0
[9,]    0 0 0 0 0 0 0 0 1 0 0
[10,]    0 0 0 0 0 0 0 0 0 1 0
[11,]    0 0 0 0 0 0 0 0 0 0 1 

But while I have your attention, let’s do a couple mathematically interesting things with identity matrices.

First of all you may have heard of Tikhonov regularisation, or ridge regression. That’s a form of penalty to rule out overly complex statistical models. @benoithamelin explains on @johndcook’s blog that

• Tikhonov regularisation is also a way of puffing air on a singular matrix det|M|=0 so as to make the matrix invertible without altering the eigenvalues too much.

Now how about a connection to group theory?

First take a 7-dimensional identity matrix, then rotate one of the rows off the top to the bottom row.

> diag(7)[ c(2:7,1), ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 1 0 0 0 0 0
[2,]    0 0 1 0 0 0 0
[3,]    0 0 0 1 0 0 0
[4,]    0 0 0 0 1 0 0
[5,]    0 0 0 0 0 1 0
[6,]    0 0 0 0 0 0 1
[7,]    1 0 0 0 0 0 0 

Inside the brackets it’s [row,column]. So the concatenated c(2,3,4,5,6,7,1) become the new row numbers.

Let’s call this matrix M.7 (a valid name in R) and look at the multiples of it. Matrix multiplication in R is the %*% symbol, not the * symbol. (* does entry-by-entry multiplication, which is good for convolution but not for this.)

Look what happens when you multiply M.7 by itself: it starts to cascade.

> M.7   %*%   M.7
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 0 1 0 0 0 0
[2,]    0 0 0 1 0 0 0
[3,]    0 0 0 0 1 0 0
[4,]    0 0 0 0 0 1 0
[5,]    0 0 0 0 0 0 1
[6,]    1 0 0 0 0 0 0
[7,]    0 1 0 0 0 0 0
> M.7   %*%   M.7   %*%   M.7
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 0 0 1 0 0 0
[2,]    0 0 0 0 1 0 0
[3,]    0 0 0 0 0 1 0
[4,]    0 0 0 0 0 0 1
[5,]    1 0 0 0 0 0 0
[6,]    0 1 0 0 0 0 0
[7,]    0 0 1 0 0 0 0 

If I wanted to do straight-up matrix powers rather than typing M %*% M %*% M %*% M %*% ... %*% M 131 times, I would need to require(expm) package and then the %^% operator for the power.

Here are some more powers of M.7:

> M.7   %^%   4
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 0 0 0 1 0 0
[2,]    0 0 0 0 0 1 0
[3,]    0 0 0 0 0 0 1
[4,]    1    0    0    0    0    0    0
[5,]    0    1    0    0    0    0    0
[6,]    0    0    1    0    0    0    0
[7,]    0    0    0    1    0    0    0
> M.7   %^%   5
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 0 0 0 0 1 0
[2,]    0 0 0 0 0 0 1
[3,]    1    0    0    0    0    0    0
[4,]    0    1    0    0    0    0    0
[5,]    0    0    1    0    0    0    0
[6,]    0    0    0    1    0    0    0
[7,]    0    0    0    0    1    0    0
> M.7   %^%   6
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0 0 0 0 0 0 1
[2,]    1    0    0    0    0    0    0
[3,]    0    1    0    0    0    0    0
[4,]    0    0    1    0    0    0    0
[5,]    0    0    0    1    0    0    0
[6,]    0    0    0    0    1    0    0
[7,]    0    0    0    0    0    1    0
> M.7   %^%   7
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    0    0    0    0    0
[2,]    0    1    0    0    0    0    0
[3,]    0    0    1    0    0    0    0
[4,]    0    0    0    1    0    0    0
[5,]    0    0    0    0    1    0    0
[6,]    0    0    0    0    0    1    0
[7,]    0    0    0    0    0    0    1


Look at the last one! It’s the identity matrix! Back to square one!

Or should I say square zero. If you multiplied again you would go through the cycle again. Likewise if you multiplied intermediate matrices from midway through, you would still travel around within the cycle. It would be exponent rules thing^x × thing^y = thing^[x+y] modulo 7.

What you’ve just discovered is the cyclic group P₇ (also sometimes called Z₇). The pair M.7, %*% is one way of presenting the only consistent multiplication table for 7 things. Another way of presenting the group is with the pair {0,1,2,3,4,5,6}, + mod 7 (that’s where it gets the name Z₇, because ℤ=the integers. A third way of presenting the cyclic 7-group, which we can also do in R:

> w <- complex( modulus=1, argument=2*pi/7 )
> w
[1] 0.6234898+0.7818315i
> w^2
[1] −0.2225209+0.9749279i
> w^3
[1] −0.9009689+0.4338837i
> w^4
[1] −0.9009689−0.4338837i
> w^5
[1] −0.2225209−0.9749279i
> w^6
[1] 0.6234898−0.7818315i
> w^7
[1] 1−0i

Whoa! All of a sudden at the 7th step we’re back to “1" again. (A different one, but "the unit element" nonetheless.)

So three different number systems

• counting numbers;
• matrix-blocks; and
• a ring of imaginary numbers

— are all demonstrating the same underlying logic.

Although each is merely an idea with only a spiritual existence, these are the kinds of “logical atoms” that build up the theories we use to describe the actual world scientifically. (Counting = money, or demography, or forestry; matrix = classical mechanics, or video game visuals; imaginary numbers = electrical engineering, or quantum mechanics.)

Three different number systems but they’re all essentially the same thing, which is this idea of a “cycle-of-7”. The cycle-of-7, when combined with other simple groups (also in matrix format), might model a biological system like a metabolic pathway.

Philosophically, P₇ is interesting because numbers—these existential things that seem to be around whether we think about them or not—have naturally formed into this “circular” shape. When a concept comes out of mathematics it feels more authoritative, a deep fact about the logical structure of the universe, perhaps closer to the root of all the mysteries.

In the real world I’d expect various other processes to hook into P₇—like a noise matrix, or some other groups. Other fundamental units should combine with it; I’d expect to see P₇ instantiated by itself rarely.

Mathematically, P₇ is interesting because three totally different number systems (imaginary, counting, square-matrix) are shown to have one “root cause” which is the group concept.

John Rhodes got famous for arguing that everything, but EVERYTHING, is built up from a logical structure made from SNAGs, of which P₇=C₇=Z₇ is one. viz, algebraic engineering

Or, in the words of Olaf Sporns:

[S]imple elements organize into dynamic patterns … Very different systems can generate strikingly similar patterns—for example, the motions of particles in a fluid or gas and the coordinated movements of bacterial colonies, swarms of fish, flocks of birds, or crowds of commuters returning home from work. … While looking for ways to compute voltage and current flow in electrical networks, the physicist Gustav Kirchhoff represented these networks as graphs…. [His] contemporary, Arthur Cayley, applied graph theoretical concepts to … enumerating chemical isomers….

Graphs, then, can be converted into adjacency matrices by putting a 0 where there is no connection between a and b in the [row=a, column=b], or putting a (±)1 where there is a (directed) link between the two nodes. The sparse [0's, 1's] matrix M.7 above is a transition matrix of the cyclical C₇ picture: 1 → 2 → 3 → 4 → 5 …. A noun (C₇) converted into a verb (%*% M.7).

In short, groups are one of those things that make people think: Hey, man, maybe EVERYTHING is a matrix. I’m going to go meditate on that.

Gotcha.

@IgorCarron blogs recent applications of compressive sensing and matrix factorisation every week.

(Compressive sensing solves underdetermined systems of equations, for example trying to fill in missing data, by L₁-norm minimisation.)

This week: reverse-engineering biochemical pathways and complex systems analysis.

Once you’re comfortable with 2-arrays and 2-matrices, you can move up a dimension or two, to 4-arrays or 4-tensors.

You can move up to a 3-array / 3-tensor just by imagining a matrix which “extends back into the blackboard”. Like a 5 × 5 matrix. With another 5 × 5 matrix behind it. And another 5 × 5 matrix behind that with 25 more entries. Etc.

The other way is to imagine “Tables of tables of tables of tables … of tables of tables of tables.” This imagination technique is infinitely extensible.

$\large \dpi{150} \bg_white \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} & \begin{bmatrix} e & f \\ g & h \end{bmatrix} \\ \\ \begin{bmatrix} j & k \\ l & m \end{bmatrix} & \begin{bmatrix} n & o \\ p & q \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} r & s \\ t & u \end{bmatrix} & \begin{bmatrix} v & w \\ x & y \end{bmatrix} \\ \\ \begin{bmatrix} z & a' \\ b' & c' \end{bmatrix} & \begin{bmatrix} d' & e' \\ f' & g' \end{bmatrix} \end{bmatrix} \\ \\ \begin{bmatrix} \begin{bmatrix} h' & j' \\ k' & l' \end{bmatrix} & \begin{bmatrix} m' & n' \\ o' & p' \end{bmatrix} \\ \\ \begin{bmatrix} q' & r' \\ s' & t' \end{bmatrix} & \begin{bmatrix} u' & v' \\ w' & x' \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} y' & z' \\ a'' & b'' \end{bmatrix} & \begin{bmatrix} c'' & d'' \\ e'' & f'' \end{bmatrix} \\ \\ \begin{bmatrix} g'' & h'' \\ j'' & k'' \end{bmatrix} & \begin{bmatrix} l'' & m'' \\ n'' & o'' \end{bmatrix} \end{bmatrix} \end{bmatrix}$

If that looks complicated, it’s just because simple recursion can produce convoluted outputs. Reading the LaTeX (alt text) is definitely harder than writing it was. (I just cut & paste \begin{bmatrix} stuff \end{bmatrix} inside other \begin{bmatrix} … \end{bmatrix}.)

(The technical difference between an array and a tensor: an array is a block which holds data. A tensor is a block of numbers which (linearly) transform matrices / vectors / tensors. Array = noun. Tensor = verb.)

As the last picture — the most important one — demonstrates, a 4-array can be filled with completely plain, ordinary, pedestrian information like age, weight, height.

Inside each of the yellow or blue boxes in the earlier pictures, is a datum. What calls for the high-dimensional array is the structure and inter-relationships of the infos. Age, height, sex, and weight each belongs_to a particular person, in an object-oriented sense. And one can marginalise, in a statistical sense, over any of those variables — consider all the ages of the people surveyed, for example.

One last takeaway:

• Normal, pedestrian, run-of-the-mill, everyday descriptions of things = high-dimensional arrays of varying data types.

Normal people speak about and conceive of information which fits high-D arrays all the time. “Attached” (in the fibre sense) to any person you know is a huge database of facts. Not to mention data-intensive visual information like parameterisations of the surface of their face, which we naturally process in an Augenblick.

(Source: slideshare.net)

### Linear Transformations will take you on a Trip Comparable to that of Magical Mushroom Sauce, And Perhaps cause More Lasting Damage

Long after I was supposed to “get it”, I finally came to understand matrices by looking at the above pictures. Staring and contemplating. I would come back to them week after week. This one is a stretch; this one is a shear; this one is a rotation. What’s the big F?

The thing is that mathematicians think about transforming an entire space at once. Any particular instance or experience must be of a point, but in order to conceive and prove statements about all varieties and possibilities, mathematicians think about “mappings of the entire possible space of objects”. (This is true in group theory as much as in linear algebra.)

So the change felt by individual ink-spots going from the original-F to the F-image would be the experience of an actual orbit in a dynamical system, of an actual feather blown by a bit of wind, an actual bullet passing through an actual heart, an actual droplet in the Mbezi River pulsing forward with the flow of time. But mathematicians consider the totality of possibilities all at once. That’s what “transforming the space” means.

$\large \dpi{300} \bg_white \begin{pmatrix} a \rightsquigarrow a & | & a \rightsquigarrow b & | & a \rightsquigarrow c \\ \hline b \rightsquigarrow a & | & b \rightsquigarrow b & | & b \rightsquigarrow c \\ \hline c \rightsquigarrow a & | & c \rightsquigarrow b & | & c \rightsquigarrow c \end{pmatrix}$

What do the slots in the matrix mean? Combing from left to right across the rows of numbers often means “from”. Going from top to bottom along the columns often means “to”. This is true in Markov transition matrices for example, and those combing motions correspond with basic matrix multiplication.

So there’s a hint of causation to this matrix business. Rows are the “causes” and columns are the “effects”. Second row, fifth column is the causal contribution of input B to the resulting output E and so on. But that’s not 100% correct, it’s just a whiff of a hint of a suggestion of a truth.

The “domain and image” viewpoint in the pictures above (which come from Flanigan & Kazdan about halfway through) is a truer expression of the matrix concept.

• [ [1, 0], [0, 1] ] maps the Mona Lisa to itself,
• [ [.799, −.602], [.602, .799] ] has a determinant of 1 — does not change the amount of paint — and rotates the Mona Lisa by 37° counterclockwise,
• [ [1, 0], [0, 2] ] stretches the image northward;
• and so on.

MATRICES IN WORDS

Matrices aren’t* just 2-D blocks of numbers — that’s a 2-array. Matrices are linear transformations. Because “matrix” comes with rules about how the numbers combine (inner product, outer product), a matrix is a verb whereas a 2-array, which can hold any kind of data with any or no rules attached to it, is a noun.

* (NB: Computer languages like R, Java, and SAGE/Python have their own definitions. They usually treat vector == list && matrix == 2-array.)

Linear transformations in 1-D are incredibly restricted. They’re just proportional relationships, like “Buy 1 more carton of eggs and it will cost an extra $2.17. Buy 2 more cartons of eggs and it will cost an extra$4.34. Buy 3 more cartons of eggs and it will cost an extra \$6.51….”  Bo-ring.

In scary mathematical runes one writes:

$\large \dpi{200} \bg_white \begin{matrix} y \propto x \\ \textit{---or---} \\ y = \mathrm{const} \cdot x \end{matrix}$

And the property of linearity itself is written:

$\large \dpi{200} \bg_white \begin{matrix} a \cdot f(\cdots, \; \blacksquare , \; \cdots) + b \cdot f( \cdots, \; \blacksquare,\; \cdots) \\ = \\ f( \cdots, \; a \cdot \blacksquare + b \cdot \blacksquare, \; \cdots) \end{matrix} \\ \\ \qquad \footnotesize{\bullet f \text{ is the linear mapping}} \\ \qquad \bullet a, b \in \text{the underlying number corpus } \mathbb{K} \\ \qquad \bullet \text{above holds for any term } \blacksquare}$

Or say: rescaling or adding first, it doesn’t matter which order.



The matrix revolution does so much generalisation of this simple concept it’s hard to imagine you’re still talking about the same thing. First of all, the insight that mathematically abstract vectors, including vectors of generalised numbers, can represent just about anything. Anything that can be “added” together.

And I put the word “added” in quotes because, as long as you define an operation that obeys commutativity, associativity, and distributes over multiplication-by-a-scalar, you get to call it “addition”! See the mathematical definition of ring.

• The blues scale has a different notion of “addition” than the diatonic scale.
• Something different happens when you add a spiteful remark to a pleased emotional state than when you add it to an angry emotional state.
• Modular and noncommutative things can be “added”. Clock time, food recipes, chemicals in a reaction, and all kinds of freaky mathematical fauna fall under these categories.
• Polynomials, knots, braids, semigroup elements, lattices, dynamical systems, networks, can be “added”. Or was that “multiplied”? Like, whatever.
• Quantum states (in physics) can be “added”.
• So “adding” is perhaps too specific a word—all we mean is “a two-place input, one-place output satisfying X, Y, Z”, where X,Y,Z are the properties from your elementary school textbook like identity, associativity, commutativity.

But that’s just vectors. Matrices also add dimensionality. Linear transformations can be from and to any number of dimensions:

• 1→7
• 4→3
• 1671 → 5
• 18 → 188
• and X→1 is a special case, the functional. Functionals comprise performance metrics, size measurements, your final grade in a class, statistical moments (kurtosis, skew, variance, mean) and other statistical metrics (Value-at-Risk, median), divergence (not gradient nor curl), risk metrics, the temperature at any point in the room, EBITDA, not function(x) { c( count(x), mean(x), median(x) ) }, and … I’ll do another article on functionals.

In contemplating these maps from dimensionality to dimensionality, it’s a blessing that the underlying equation is so simple as linear (proportional). When thinking about information leakage, multi-parameter cause & effect, sources & sinks in a many-equation dynamical system, images and preimages and dual spaces; when the objects being linearly transformed are systems of partial differential equations, — being able to reduce the issue to mere multi-proportionalities is what makes the problems tractable at all.

So that’s why so much painstaking care is taken in abstract linear algebra to be absolutely precise — so that the applications which rely on compositions or repetitions or atlases or inversions of linear mappings will definitely go through.



Why would anyone care to learn matrices?

Understanding of matrices is the key difference between those who “get” higher maths and those who don’t. I’ve seen many grad students and professors reading up on linear algebra because they need it to understand some deep papers in their field.

• Linear transformations can be stitched together to create manifolds.
• If you add Fourier | harmonic | spectral techniques + linear algebra, you get really trippy — yet informative — views on things. Like spectral mesh compressions of ponies.
• The “linear basis” and “linear combination” metaphors extend far. For example, to eigenfaces or When Doves Cry Inside a Convex Hull.
• You can’t understand slack vectors or optimisation without matrices.
• JPEG, discrete wavelet transform, and video compression rely on linear algebra.
• A 2-matrix characterises graphs or flows on graphs. So that’s Facebook friends, water networks, internet traffic, ecosystems, Ising magnetism, Wassily Leontief’s vision of the economy, herd behaviour, network-effects in sales (“going viral”), and much, much more that you can understand — after you get over the matrix bar.
• The expectation operator of statistics (“average”) is linear.
• Dropping a variable from your statistical analysis is linear. Mathematicians call it “projection onto a lower-dimensional space” (second-to-last example at top).
• Taking-the-derivative is linear. (The differential, a linear approximation of a could-be-nonlinear function, is the noun that results from doing the take-the-derivative verb.)
• The composition of two linear functions is linear. The sum of two linear functions is linear. From these it follows that long differential equations—consisting of chains of “zoom-in-to-infinity" (via "take-the-derivative") and "do-a-proportional-transformation-there" then "zoom-back-out" … long, long chains of this, can amount in total to no more than a linear transformation.

• If you line up several linear transformations with the proper homes and targets, you can make hard problems easy and impossible problems tractable. The more “advanced-mathematics” the space you’re considering, the more things become linear transformations.
• That’s why linear operators are used in both quantum mechanical theory and practical things like building helicopters.
• You can understand dynamical systems, attractors, and thereby understand love better through matrices.