Quantcast

Posts tagged with linear


A piecewise linear function in two dimensions (top) and the convex polytopes on which it is linear (bottom).

A piecewise linear function in two dimensions (top) and the convex polytopes on which it is linear (bottom).

(Source: Wikipedia)


hi-res




Slow and steady.

hi-res




SETUP (CAN BE SKIPPED)

We start with data (how was it collected?) and the hope that we can compare them. We also start with a question which is of the form:

  • how much tax increase is associated with how much tax avoidance/tax evasion/country fleeing by the top 1%?
  • how much traffic does our website lose (gain) if we slow down (speed up) the load time?
  • how many of their soldiers do we kill for every soldier we lose?
  • how much do gun deaths [suicide | gang violence | rampaging multihomicide] decrease with 10,000 guns taken out of the population?
  • how much more fuel do you need to fly your commercial jet 1,000 metres higher in the sky?
  • how much famine [to whom] results when the price of low-protein wheat rises by $1?
  • how much vegetarian eating results when the price of beef rises by $5? (and again distributionally, does it change preferentially by people with a certain culture or personal history, such as they’ve learned vegetarian meals before or they grew up not affording meat?) How much does the price of beef rise when the price of feed-corn rises by $1?
  • how much extra effort at work will result in how much higher bonus?
  • how many more hours of training will result in how much faster marathon time (or in how much better heart health)?
  • how much does society lose when a scientist moves to the financial sector?
  • how much does having a modern financial system raise GDP growth? (here ∵ the X ~ branchy and multidimensional, we won’t be able to interpolate in Tufte’s preferred sense)
  • how many petatonnes of carbon per year does it take to raise the global temperature by how much?
  • how much does $1000 million spent funding basic science research yield us in 30 years?
  • how much will this MBA raise my annual income?
  • how much more money does a comparable White make than a comparable Black? (or a comparable Man than a comparable Woman?)
  • how much does a reduction in child mortality decrease fecundity? (if it actually does)

  • how much can I influence your behaviour by priming you prior to this psychological experiment?
  • how much higher/lower do Boys score than Girls on some assessment? (the answer is usually “low |β|, with low p” — in other words “not very different but due to the high volume of data whatever we find is with high statistical strength”)

bearing in mind that this response-magnitude may differ under varying circumstances. (Raising morning-beauty-prep time from 1 minute to 10 minutes will do more than raising 110 minutes to 120 minutes of prep. Also there may be interaction terms like you need both a petroleum engineering degree and to live in one of {Naija, Indonesia, Alaska, Kazakhstan, Saudi Arabia, Oman, Qatar} in order to see the income bump. Also many of these questions have a time-factor, like the MBA and the climate ones.)

building up a nonlinear function from linear parts

As Trygve Haavelmo put it: using reason alone we can probably figure out which direction each of these responses will go. But knowing just that raising the tax rate will drive away some number of rich doesn’t push the debate very far—if all you lose is a handful of symbolic Eduardo Saverins who were already on the cusp of fleeing the country, then bringing up the Laffer curve is chaff. But if the number turns out to be large then it’s really worth discussing.

In less polite terms: until we quantify what we’re debating about, you can spit bollocks all day long. Once the debate is quantified then the discussion should become way more intelligent, less derailing to irrelevant theoretically-possible-issues-which-are-not-really-worth-wasting-time-on.

So we change one variable over which we have control and measure how the interesting thing responds. Once we measure both we come to the regression stage where we try to make a statement of the form “A 30% increase in effort will result in a 10% increase in wage” or “5 extra minutes getting ready in the morning will make me look 5% better”. (You should agree from those examples that the same number won’t necessarily hold throughout the whole range. Like if I spend three hours getting ready the returns will have diminished from the returns on the first five minutes.)

Correlation

Avoiding causal language, we say that a 10% increase in (your salary) is associated with a 30% increase in (your effort).

 
MAIN PART (SKIP TO HERE IF SKIMMING)

The two numbers that jump out of any regression table output (e.g., lm in R) are p and β.

  • β is the estimated size of the linear effect
  • p is how sure we are that the estimated size is exactly β. (As in golf, a low p is better: more confident, more sure. Low p can also be stated as a high t.)

Wary that regression tables spit out many, many numbers (like Durbin-Watson statistic, F statistic, Akaike Information, and more) specifically to measure potential problems with interpreting β and p naïvely, here are pictures of the textbook situations where p and β can be interpreted in the straightforward way:

First, the standard cases where the regression analysis works as it should and how to read it is fairly obvious:
(NB: These are continuous variables rather than on/off switches or ordered categories. So instead of “Followed the weight-loss regimen” or “Didn’t follow the weight-loss regimen” it’s someone quantified how much it was followed. Again, actual measurements (how they were coded) getting in the way of our gleeful playing with numbers.)

image
image
image

Second, the case I want to draw attention to: a small statistical significance doesn’t necessarily mean nothing’s going on there.

image
image

The code I used to generate these fake-data and plots.

If the regression measures a high β but low confidence (high p), that is still worth taking a look at. If regression picks up wide dispersion in male-versus-female wages—let’s say double—but we’re not so confident (high p) that it’s exactly double because it’s sometimes 95%, sometimes 180%, sometimes 310%, we’ve still picked up a significant effect.

The exact value of β would not be statistically significant or confidently precise due to a high p but actually this would be a very significant finding. (Try it the same with any of my other examples, or another quantitative-comparison scenario you think up. It’s either a serious opportunity, or a serious problem, that you’ve uncovered. Just needs further looking to see where the variation around double comes from.)

You can read elsewhere about how awful it is that p<.05 is the password for publishable science, for many reasons that require some statistical vocabulary. But I think the most intuitive problem is the one I just stated. If your geiger counter flips out to ten times the deadly level of radiation, it doesn’t matter if it sometimes reads 8, sometimes 5, and sometimes 15—the point is, you need to be worried and get the h*** out of there. (Unless the machine is wacked—but you’d still be spooked, wouldn’t you?)

 
FOLLOW-UP (CAN BE SKIPPED)

The scale of β is the all-important thing that we are after. Small differences in βs of variables that are important to your life can make a huge difference.

  • Think about getting a 3% raise (1.03) versus a 1% wage cut (.99).
  • Think about twelve in every 1000 births kill the mother versus four in every 1000.
  • Think about being 5 minutes late for the meeting versus 5 minutes early.

image
linear maps as multiplication
linear mappings -- notice they're ALL straight lines through the origin!


Order-of-magnitude differences (like 20 versus 2) is the difference between fly and dog; between life in the USA and near-famine; between oil tanker and gas pump; between Tibet’s altitude and Illinois’; between driving and walking; even the Black Death was only a tenth of an order of magnitude of reduction in human population.




Keeping in mind that calculus tells us that nonlinear functions can be approximated in a local region by linear functions (unless the nonlinear function jumps), β is an acceptable measure of “Around the current levels of webspeed” or “Around the current levels of taxation” how does the interesting thing respond.



Linear response magnitudes can also be used to estimate global responses in a nonlinear function, but you will be quantifying something other than the local linear approximation.

Anscombes quartet  The four data sets are different, yet they have the same &#8220;line of best fit&#8221; as computed by ordinary least squares regression.




going the long way
What does it mean when mathematicians talk about a bijection or homomorphism?
Imagine you want to get from X to X′ but you don&#8217;t know how. Then you find a &#8220;different way of looking at the same thing&#8221; using ƒ. (Map the stuff with ƒ to another space Y, then do something else over in image ƒ, then take a journey over there, and then return back with ƒ ⁻¹.)
The fact that a bijection can show you something in a new way that suddenly makes the answer to the question so obvious, is the basis of the jokes on www.theproofistrivial.com.


In a given category the homomorphisms Hom ∋ ƒ preserve all the interesting properties. Linear maps, for example (except when det=0) barely change anything&#8212;like if your government suddenly added another zero to the end of all currency denominations, just a rescaling&#8212;so they preserve most interesting properties and therefore any linear mapping to another domain could be inverted back so anything you discover over in the new domain (image of ƒ) can be used on the original problem.
All of these fancy-sounding maps are linear:
Fourier transform
Laplace transform
taking the derivative
Box-Müller
They sound fancy because whilst they leave things technically equivalent in an objective sense, the result looks very different to people. So then we get to use intuition or insight that only works in say the spectral domain, and still technically be working on the same original problem.

Pipe the problem somewhere else, look at it from another angle, solve it there, unpipe your answer back to the original viewpoint/space.
 
For example: the Gaussian (normal) cumulative distribution function is monotone, hence injective (one-to-one), hence invertible.

By contrast the Gaussian probability distribution function (the &#8220;default&#8221; way of looking at a &#8220;normal Bell Curve&#8221;) fails the horizontal line test, hence is many-to-one, hence cannot be totally inverted.

So in this case, integrating once ∫[pdf] = cdf made the function &#8220;mathematically nicer&#8221; without changing its interesting qualities or altering its inherent nature.
 
&#8220;Going the long way&#8221; can be easier than trying to solve a problem directly.

going the long way

What does it mean when mathematicians talk about a bijection or homomorphism?

Imagine you want to get from X to X′ but you don’t know how. Then you find a “different way of looking at the same thing” using ƒ. (Map the stuff with ƒ to another space Y, then do something else over in image ƒ, then take a journey over there, and then return back with ƒ ⁻¹.)

The fact that a bijection can show you something in a new way that suddenly makes the answer to the question so obvious, is the basis of the jokes on www.theproofistrivial.com.

image
image
image



In a given category the homomorphisms Hom ∋ ƒ preserve all the interesting properties. Linear maps, for example (except when det=0) barely change anything—like if your government suddenly added another zero to the end of all currency denominations, just a rescaling—so they preserve most interesting properties and therefore any linear mapping to another domain could be inverted back so anything you discover over in the new domain (image of ƒ) can be used on the original problem.

All of these fancy-sounding maps are linear:

They sound fancy because whilst they leave things technically equivalent in an objective sense, the result looks very different to people. So then we get to use intuition or insight that only works in say the spectral domain, and still technically be working on the same original problem.

image

Pipe the problem somewhere else, look at it from another angle, solve it there, unpipe your answer back to the original viewpoint/space.

 

For example: the Gaussian (normal) cumulative distribution function is monotone, hence injective (one-to-one), hence invertible.

By contrast the Gaussian probability distribution function (the “default” way of looking at a “normal Bell Curve”) fails the horizontal line test, hence is many-to-one, hence cannot be totally inverted.

So in this case, integrating once ∫[pdf] = cdf made the function “mathematically nicer” without changing its interesting qualities or altering its inherent nature.

 

“Going the long way” can be easier than trying to solve a problem directly.




Just playing with z² / z² + 2z + 2

g(z)=\frac{z^2}{z^2+2z+2}

on WolframAlpha. That’s Wikipedia’s example of a function with two poles (= two singularities = two infinities). Notice how “boring” line-only pictures are compared to the the 3-D ℂ→>ℝ picture of the mapping (the one with the poles=holes). That’s why mathematicians say ℂ uncovers more of “what’s really going on”.

As opposed to normal differentiability, ℂ-differentiability of a function implies:

  • infinite descent into derivatives is possible (no chain of C¹ ⊂ C² ⊂ C³ ... Cω like usual)

  • nice Green’s-theorem type shortcuts make many, many ways of doing something equivalent. (So you can take a complicated real-world situation and validly do easy computations to understand it, because a squibbledy path computes the same as a straight path.)
  

Pretty interesting to just change things around and see how the parts work.

  • The roots of the denominator are 1+i and 1−i (of course the conjugate of a root is always a root since i and −i are indistinguishable)
  • you can see how the denominator twists
  • a fraction in ℂ space maps lines to circles, because lines and circles are turned inside out (they are just flips of each other: see also projective geometry)
  • if you change the z^2/ to a z/ or a 1/ you can see that.
  • then the Wikipedia picture shows the poles (infinities) 

Complex ℂ→ℂ maps can be split into four parts: the input “real”⊎”imaginary”, and the output “real“⊎”imaginary”. Of course splitting them up like that hides the holistic truth of what’s going on, which comes from the perspective of a “twisted” plane where the elements z are mod z • exp(i • arg z).

a conformal map (angle-preserving map)

ℂ→ℂ mappings mess with my head…and I like it.










What is the best interpretive program for making sense of quantum mechanics? Here is the way I would put it now. The question is completely backward. It acts as if there is this thing called quantum mechanics, displayed and available for everyone to see as they walk by it—kind of like a lump of something on a sidewalk. The job of interpretation is to find the right spray to cover up any offending smells. The usual game of interpretation is that an interpretation is always something you add to the pre-existing, universally recognized quantum theory.


What has been lost sight of is that physics [theory] is a dynamic interplay between storytelling and equation writing. Neither one stands alone, not even at the end of the day. But which has the more fatherly role? If you ask me, it’s the storytelling…. An interpretation is powerful if it gives guidance, and I would say the very best interpretation is the one whose story is so powerful it gives rise to the mathematical formalism itself (the part where nonthinking can take over)….


Take the nearly empty imagery of the many-worlds interpretation(s). Who could derive the specific structure of complex Hilbert space out of it if one didn’t already know the formalism? Most present-day philosophers of science just don’t seem to get this: If an interpretation is going to be part of physics, instead of a self-indulgent ritual to the local god, it had better have some cash value for physical practice itself.




I learned about Zadeh’s fuzzy logic when I was a graduate student…despite the intrinsic interest of the idea, there didn’t seem to be any really impressive results….

When I first heard about “fuzzy logic” control systems (…about 20 years ago — before Google or Wikipedia), I was puzzled. What exactly does the degree of truth of statements have to do with algorithms for controlling trains or elevators? When I asked this question after a dog-and-pony show at a Japanese research lab in the mid-1980s, I got answers … repeating what I already knew about fuzzy logic, without adding anything convincing about the application to control theory.

It sounded to me like technological double-talk. I was sure that the engineers were doing something relevant to control in complicated situations, but the “fuzzy logic” label seemed like a flack’s evocative slogan for a variety of different technologies that didn’t seem to have anything much to do with logic, fuzzy or otherwise.
 
A friend with a background in chemical engineering set me straight. His explanation went something like this: Standard control systems are linear. That means that controllable outputs (heating, accelerating, braking, whatever) are calculated as a linear function of available inputs (time series of temperature, velocity, and so on).

Linearity makes it easy to design such systems with specified performance characteristics, to guarantee that the system is stable and won’t go off into wild oscillations, and so on. However, the underlying mechanisms may be highly non-linear, and therefore the optimal coefficient choices for a linear control system may be quite different in different regions of a system’s space of operating parameters.

One possible solution is to use different sets of control coefficients for different ranges of input parameters. However, the transition from one control regime to another may not be a smooth one, and a system might even hover at the boundary for a while, switching back and forth.

So the “fuzzy control” idea is to interpolate among the recipes for action given by different linear control systems. If the measured input variables put us halfway between the center of state A and the center of state B, then we should use output parameters that are halfway between state A’s recipe and state B’s recipe. If we’re 2/3 of the way from A to B, then we mix 1/3 of A’s recipe with 2/3 of B’s; and so on.
 
In the case of the four stages of rice cooking, I suppose that a fuzzy logic controller is able to treat the process as a series of fuzzy or gradient transitions rather than a series of hard, stepwise transitions. … a vaguely analogous method to fit a smoothed piecewise linear model to data about oil recovery as a function of various independent variables, including oil field “age”.

In both cases, the fuzzy approach might well be appropriate, under whatever name (though here’s an alternative story about heating control…).

… And indeed even plain fuzzy is by no means an entirely positive word. When George Bush famously accused Al Gore of “disparaging my [tax] plan with all this Washington fuzzy math”, it was not a warm fuzzy moment.



[Update: Fernando Pereira emailed

Petroleum geologists have been pioneers on pretty sophisticated spatiotemporal estimation and smoothing techniques, for instance kriging (aka Gaussian process regression for statisticians). There are tight connections between GP regression and spline smoothing (via the theory of reproducing kernel Hilbert spaces). Either the Saudis are not hiring the best petroleum geologists, or they are being deliberately obfuscating with marketroid talk. I can’t think of any situation in which fuzzy ideas (pun intended) would be preferable to Bayesian statistics for inference.

…]

[Update 2: A review article by David Abramowitch, with slides.

Mark Liberman, in When “Fuzzy” Means “Smoothed Piecewise Linear”

One cool thing to imagine: the multi-dimensional space of parameters of the control system, the space of all possible tunings of the knobs — and how a few multi-dimensional charts — how do they meet up in this high-dimensional space? — link together.