Quantcast

Posts tagged with calculus

Define the derivative to be the thing that makes the fundamental theorem of calculus work.










So, you never went to university…or you assiduously avoided all maths whilst at university…or you started but were frightened away by the epsilons and deltas…. But you know the calculus is one of the pinnacles of human thought, and it would be nice to know just a bit of what they’re talking about……

Both thorough and brief intro-to-calculus lectures can be found online. I think I can explain differentiation and integration—the two famous operations of calculus—even more briefly.

 

Let’s talk about sequences of numbers. Sequences that make sense next to each other, like your child’s height at different ages

image

not just an unrelated assemblage of numbers which happen to be beside each other. If you have handy a sequence of numbers that’s relevant to you, that’s great.

 

Differentiation and integration are two ways of transforming the sequence to see it differently-but-more-or-less-equivalently.

Consider the sequence 1, 2, 3, 4, 5. If I look at the differences I could rewrite this sequence as [starting point of 1], +1, +1, +1, +1. All I did was look at the difference between each number in the sequence and its neighbour. If I did the same thing to the sequence 1, 4, 9, 16, 25, the differences would be [starting point of 1], +3, +5, +7, +9.

image
image

That’s the derivative operation. It’s basically first-differencing, except in real calculus you would have an infinite, continuous thickness of data—as many numbers between 1, 4, and 9 as you want. In R you can use the diff operation on a sequence of related data to automate what I did above. For example do

  • seq <- 1:5
  • diff(seq)
  • seq2 <- seq*seq
    successor function and square function
  • diff(seq2)

A couple of things you may notice:

  • I could have started at a different starting point and talked about a sequence with the same changes, changing from a different initial value. For example 5, 6, 7, 8, 9 does the same +1, +1, +1, +1 but starts at 5.
  • I could second-difference the numbers, differencing the first-differences: +3, +5, +7, +9 (the differences in the sequence of square numbers) gets me ++2, ++2, ++2.
  • I could third-difference the numbers, differencing the second-differences: +++0, +++0.
  • Every time I diff I lose one of the observations. This isn’t a problem in the infinitary version although sometimes even infinitely-thick sequences can only be differentiated a few times, for other reasons.

The other famous tool for looking differently at a sequence is to look at cumulative sums: cumsum in R. This is integration. Looking at “total so far” in the sequence.

Consider again the sequence 1, 2, 3, 4, 5. If I added up the “total so far” at each point I would get 1, 3, 6, 10, 15. This is telling me the same information – just in a different way. The fundamental theorem of calculus says that if I diff( cumsum( 1:5 )) I will get back to +1, +2, +3, +4, +5. You can verify this without a calculator by subtracting neighbours—looking at differences—amongst 1, 3, 6, 10, 15. (Go ahead, try it; I’ll wait.)

Let’s look back at the square sequence 1, 4, 9, 25, 36. If I cumulatively sum I’d have 1, 5, 15, 40, 76. Pick any sequence of numbers that’s relevant to you and do cumsum and diff on it as many times as you like.

 

Those are the basics.

Why are people so interested in this stuff?

Why is it useful? Why did it make such a splash and why is it considered to be in the canon of human progress? Here are a few reasons:

  • If the difference in a sequence goes from +, +, +, +, … to −, −, −, −, …, then the numbers climbed a hill and started going back down. In other words the sequence reached a maximum. We like to maximize things, like efficiency, profit, 
  • A corresponding statement could be made for valley-bottoms. We like to minimise things like cost, waste, usage of valuable materials, etc.
  • The diff verb takes you from position → velocity → acceleration, so this mathematics relates fundamental stuff in physics.
  • The cumsum verb takes you from acceleration → velocity → position, which allows you to calculate stuff like work. Therefore you can pre-plan for example what would be the energy cost to do something in a large scale that’s too costly to just try it.
  • What’s the difference between income and wealth? Well if you define net income to be what you earn less what you spend,
    image
    then wealth = cumsum(net income) and net income = diff(wealth). Another everyday relationship made absolutely crystal clear.
    http://philwendt.com/wp-content/uploads/2012/03/Figure1-wealth-and-Income-percent-share-of-1percent-Vol-III.jpg
  • In higher-dimensional or more-abstract versions of the fundamental theorem of calculus, you find out that, sometimes, complicated questions like the sum of forces a paramecium experiences all along a sequential curved path, can be reduced to merely the start and finish (i.e., the complicatedness may be one dimension less than what you thought).
    image
  • Further-abstracted versions also allow you to optimise surfaces (including “surfaces” in phase-space) and therefore build bridges or do rocket-science.
    image
  • With the fluidity that comes with being able to diff and cumsum, you can do statistics on continuous variables like height or angle, rather than just on count variables like number of people satisfying condition X.
    kernel density plot of Oxford boys' heights.
  • At small enough scales, calculus (specifically Taylor’s theorem) tells you that "most" nonlinear functions can be linearised: i.e., approximated by repeated addition of a constant +const+const+const+const+const+.... That’s just about the simplest mathematical operation I can think of. It’s nice to be able to talk at least locally about a complicated phenomenon in such simple terms.
    linear maps as multiplication
  • In the infinitary version, symbolic formulae diff and cumsum to other symbolic formulae. For example diff( x² ) = 2x (look back at the square sequence above if you didn’t notice this the first time). This means instead of having to try (or make your computer try) a lot of stuff to see what’s going to work, you can just-plain-understand something.
  • Also because of the symbolic nicety: post-calculus, if you only know how, e.g., diff( diff( diff( x ))) relates to x – but don’t know a formula for x itself – you’re not totally up a creek. You can use calculus tools to make relationships between varying diff levels of a sequence, just as good as a normal formula – thus expanding the landscape of things you can mathematise and solve.
  • In fact diff( diff( x )) = − x is the source of this, this
    image
    , this,
    image
    , and therefore the physical properties of all materials (hardness, conductivity, density, why is the sky blue, etc) – which derive from chemistry which derives from Schrödinger’s Equation, which is solved by the “harmonic” diff( diff( x )) = − x.

Calculus isn’t “the end” of mathematics. It’s barely even before or after other mathematical stuff you may be familiar with. For example it doesn’t come “after” trigonometry, although the two do relate to each other if you’re familiar with both. You could apply the “differencing” idea to groups, topology, imaginary numbers, or other things. Calculus is just a tool for looking at the same thing in a different way.




A fun exercise/problem/puzzle introducing function space.

hi-res




Transgressing boundaries, smashing binaries, and queering categories are important goals within certain schools of thought.

Reading such stuff the other week-end I noticed (a) a heap of geometrical metaphors and (b) limited geometrical vocabulary.

What I dislike about the word #liminality http://t.co/uWCczGDiDj&#160;: it suggests the ∩ is small or temporary.
— isomorphismes (@isomorphisms)
July 7, 2013

In my opinion functional analysis (as in, precision about mathematical functions&#8212;not practical deconstruction) points toward more appropriate geometries than just the [0,1] of fuzzy logic. If your goal is to escape &#8220;either/or&#8221; then I don&#8217;t think you&#8217;ve escaped very much if you just make room for an &#8220;in between&#8221;.

By contrast ℝ→ℝ functions (even continuous ones; even smooth ones!) can wiggle out of definitions you might naïvely try to impose on them. The space of functions naturally lends itself to different metrics that are appropriate for different purposes, rather than &#8220;one right answer&#8221;. And even trying to define a rational means of categorising things requires a lot&#8212;like, Terence Tao level&#8212;of hard thinking.

I&#8217;ll illustrate my point with the arbitrary function ƒ pictured at the top of this post. Suppose that ƒ∈𝒞². So it does make sense to talk about whether ƒ′′≷0.
But in the case I drew above, ƒ′′≹0. In fact &#8220;most&#8221; 𝒞² functions on that same interval wouldn&#8217;t fully fit into either &#8220;concave" or "convex&#8221;.
So &#8220;fits the binary&#8221; is rarer than &#8220;doesn&#8217;t fit the binary&#8221;. The &#8220;borderlands&#8221; are bigger than the staked-out lands. And it would be very strange to even think about trying to shoehorn generic 𝒞² functions into
one type,
the other,
or &#8220;something in between&#8221;.
Beyond &#8220;false dichotomy&#8221;, ≶ in this space doesn&#8217;t even pass the scoff test. I wouldn&#8217;t want to call the ƒ I drew a &#8220;queer function&#8221;, but I wonder if a geometry like this isn&#8217;t more what queer theorists want than something as evanescent as &#8220;liminal&#8221;, something as thin as "boundary".

Transgressing boundaries, smashing binaries, and queering categories are important goals within certain schools of thought.

https://upload.wikimedia.org/wikipedia/commons/3/33/Anna_P.jpg

Reading such stuff the other week-end I noticed (a) a heap of geometrical metaphors and (b) limited geometrical vocabulary.

In my opinion functional analysis (as in, precision about mathematical functions—not practical deconstruction) points toward more appropriate geometries than just the [0,1] of fuzzy logic. If your goal is to escape “either/or” then I don’t think you’ve escaped very much if you just make room for an “in between”.

image

By contrast ℝ→ℝ functions (even continuous ones; even smooth ones!) can wiggle out of definitions you might naïvely try to impose on them. The space of functions naturally lends itself to different metrics that are appropriate for different purposes, rather than “one right answer”. And even trying to define a rational means of categorising things requires a lot—like, Terence Tao level—of hard thinking.

In harmonic analysis and PDE, one often wants to place a function ƒ:ℝᵈ→ℂ on some domain (let’s take a Euclidean space ℝᵈ for simplicity) in one or more function spaces in order to quantify its “size”&#8230;.  [T]here is an entire zoo of function spaces one could consider, and it can be difficult at first to see how they are organised with respect to each other.  &#8230;  For function spaces X on Euclidean space, two such exponents are the regularity s of the space, and the integrability p of the space.  &#8230;  &#8230;        &#8212;Terence Tao  Hat tip: @AnalysisFact

I’ll illustrate my point with the arbitrary function ƒ pictured at the top of this post. Suppose that ƒ∈𝒞². So it does make sense to talk about whether ƒ′′≷0.

But in the case I drew above, ƒ′′≹0. In fact “most” 𝒞² functions on that same interval wouldn’t fully fit into either “concave" or "convex”.

So “fits the binary” is rarer than “doesn’t fit the binary”. The “borderlands” are bigger than the staked-out lands. And it would be very strange to even think about trying to shoehorn generic 𝒞² functions into

  • one type,
  • the other,
  • or “something in between”.

Beyond “false dichotomy”, ≶ in this space doesn’t even pass the scoff test. I wouldn’t want to call the ƒ I drew a “queer function”, but I wonder if a geometry like this isn’t more what queer theorists want than something as evanescent as “liminal”, something as thin as "boundary".


hi-res




http://2.bp.blogspot.com/-jTsy6D2Kc-E/T9Qm0CVmOnI/AAAAAAAABfs/YHSWk-j95Kc/s1600/tomato+cam.jpg

Cylinder = line-segment × disc

C = | × ●

The “product rule” from calculus works as well with the boundary operator as with the differentiation operator .

∂C  =   ∂| × ●   +   | × ∂●

image

image

image

Oops. Typo. Sorry, I did this really late at night! cos and sin need to be swapped.

image

image

image

image

image

image

image

image

image

Oops. Another typo. Wrong formula for circumference.

image




I was re-reading Michael Murray’s explanation of cointegration:

and marvelling at the calculus.

Of course it’s not any subtraction. It’s subtracting a function from a shifted version of itself. Still doesn’t sound like a universal revolution.

(But of course the observation that the lagged first-difference will be zero around an extremum (turning point), along with symbolic formulæ for (infinitesimal) first-differences of a function, made a decent splash.)

definition of derivative

Jeff Ryan wrote some R functions that make it easy to first-difference financial time series.

image

Here’s how to do the first differences of Goldman Sachs’ share price:

require(quantmod)
getSymbols("GS")
gs <- Ad(GS)
plot(  gs - lag(gs)  )

image

Look how much more structured the result is! Now all of the numbers are within a fairly narrow band. With length(gs) I found 1570 observations. Here are 1570 random normals plot(rnorm(1570, sd=10), type="l") for comparison:

image

Not perfectly similar, but very close!

Looking at the first differences compared to a Gaussian brings out what’s different between public equity markets and a random walk. What sticks out to me is the vol leaping up aperiodically in the $GS time series.

I think I got even a little closer with drawing the stdev’s from a Poisson process plot(rnorm(1570, sd=rpois(1570, lambda=5)), type="l")

image

but I’ll end there with the graphical futzing.

What’s really amazing to me is how much difference a subtraction makes.




differential topology lecture by John W. Milnor from the 1960’s: Topology from the Differentiable Viewpoint

  • A function that’s problematic for analytic continuations:
    exp{ − 1 / t )
    image
  • Definitions of smooth manifold, diffeomorphism, category of smooth manifolds
  • bicontinuity condition
  • two Euclidean spaces are diffeomorphic iff they have the same dimension
  • torus ≠ sphere but compact manifolds are equivalence-classable by genus
  • Moebius band is not compact
  • Four categories of topology, which were at first thought to be the same, but by the 60’s seen to be really different (and the maps that keep you within the same category):
    File:PDIFF.svg
    diffeomorphisms on smooth manifolds;
    http://24.media.tumblr.com/tumblr_m0w3euWhCY1qc38e9o3_1280.jpg
    Again I say: STRING THEORY MOTHAF**KAAAAAAAAAAS

    image
    image



    piecewise-linear maps on simplicial complexes;
    File:Piecewise linear function2D.svg
    File:NURBstatic.svg

    image
    homeomorphisms on sets (point-set topology)
    http://24.media.tumblr.com/tumblr_m8axt2pGdc1qc38e9o1_r1_1280.png

    http://25.media.tumblr.com/tumblr_m8axt2pGdc1qc38e9o2_1280.png

    image
    File:Topological vector space illust.svg

  • Those three examples of categories helped understand category and functor in general. You could work for your whole career in one category—for example if you work on fluid dynamics, you’re doing fundamentally different stuff than computer scientists on type theory—and this would filter through to your vocabulary and the assumptions you take for granted. Eg “maps” might mean “smooth bicontinuous maps” in fluid dynamics but non-surjective, discontinuous maps are possible all the time in logic or theoretical comptuer science. Functor being the comparison between the different subjects.
  • The fourth, homotopy theory, was invented in the 1930’s because topology itself was too hard.

    image
  • Minute 38-40. A pretty slick proof. I often have a hard time following, but this is an exception.
  • Minute 43. He misspeaks! In defining the hypercube.
  • Minute 47. Homology groups relate the category of topological-spaces-with-homotopy-classes-of-mappings, to the category of groups-with-homomorphisms.

That’s the first of three lectures. Also Milnor’s thoughts almost half a century later on how differential topology had evolved since the lectures:

Hat tip to david a edwards.

What I really loved about this talk was the categorical perspective. The talks are really structured so that three categories — smooth things, piecewise things, and points/sets — are developed in parallel. Better than development of the theory of categories in the abstract, I like having these specific examples of categories and how “sameness" differs from category to category.

(Source: simonsfoundation.org)




going the long way
What does it mean when mathematicians talk about a bijection or homomorphism?
Imagine you want to get from X to X′ but you don&#8217;t know how. Then you find a "different way of looking at the same thing" using ƒ. (Map the stuff with ƒ to another space Y, then do something else over in image ƒ, then take a journey over there, and then return back with ƒ ⁻¹.)
The fact that a bijection can show you something in a new way that suddenly makes the answer to the question so obvious, is the basis of the jokes on www.theproofistrivial.com.


In a given category the homomorphisms Hom ∋ ƒ preserve all the interesting properties. Linear maps, for example (except when det=0) barely change anything&#8212;like if your government suddenly added another zero to the end of all currency denominations, just a rescaling&#8212;so they preserve most interesting properties and therefore any linear mapping to another domain could be inverted back so anything you discover over in the new domain (image of ƒ) can be used on the original problem.
All of these fancy-sounding maps are linear:
Fourier transform
Laplace transform
taking the derivative
Box-Müller
They sound fancy because whilst they leave things technically equivalent in an objective sense, the result looks very different to people. So then we get to use intuition or insight that only works in say the spectral domain, and still technically be working on the same original problem.

Pipe the problem somewhere else, look at it from another angle, solve it there, unpipe your answer back to the original viewpoint/space.
 
For example: the Gaussian (normal) cumulative distribution function is monotone, hence injective (one-to-one), hence invertible.

By contrast the Gaussian probability distribution function (the &#8220;default&#8221; way of looking at a &#8220;normal Bell Curve&#8221;) fails the horizontal line test, hence is many-to-one, hence cannot be totally inverted.

So in this case, integrating once ∫[pdf] = cdf made the function &#8220;mathematically nicer&#8221; without changing its interesting qualities or altering its inherent nature.
 
Or here&#8217;s an example from calc 101: u-substitution. You&#8217;re essentially saying &#8220;Instead of solving this integral, how about if I solve a different one which is exactly equivalent?&#8221; The →ƒ in the top diagram is the u-substitution itself. The &#8220;main verb&#8221; is doing the integral. U-substituters avoid doing the hard integral, go the long way, and end up doing something much easier.

 
Or in physics&#8212;like tensors and Schrödinger solving and stuff.Physicists look for substitutions that make the computation they have to do more tractable. Try solving a Schrödinger PDE for hydrogen&#8217;s first electron s¹in xyz coordinates (square grid)&#8212;then try solving it in spherical coordinates (longitude &amp; latitude on expanding shells). Since the natural symmetry of the s¹ orbital is spherical, changing basis to polar coords makes life much easier.

 
Likewise one of the goals of tensor analysis is to not be tied to any particular basis&#8212;so long as the basis doesn&#8217;t trip over itself, you should be free to switch between bases to get different jobs done. Terry Tao talks about something like this under the keyword &#8220;spending symmetry&#8221;&#8212;if you use up your basis isomorphism, you need to give it back before you can use it again.
"Going the long way" can be easier than trying to solve a problem directly.

going the long way

What does it mean when mathematicians talk about a bijection or homomorphism?

Imagine you want to get from X to X′ but you don’t know how. Then you find a "different way of looking at the same thing" using ƒ. (Map the stuff with ƒ to another space Y, then do something else over in image ƒ, then take a journey over there, and then return back with ƒ ⁻¹.)

The fact that a bijection can show you something in a new way that suddenly makes the answer to the question so obvious, is the basis of the jokes on www.theproofistrivial.com.

image
image
image



In a given category the homomorphisms Hom ∋ ƒ preserve all the interesting properties. Linear maps, for example (except when det=0) barely change anything—like if your government suddenly added another zero to the end of all currency denominations, just a rescaling—so they preserve most interesting properties and therefore any linear mapping to another domain could be inverted back so anything you discover over in the new domain (image of ƒ) can be used on the original problem.

All of these fancy-sounding maps are linear:

They sound fancy because whilst they leave things technically equivalent in an objective sense, the result looks very different to people. So then we get to use intuition or insight that only works in say the spectral domain, and still technically be working on the same original problem.

image

Pipe the problem somewhere else, look at it from another angle, solve it there, unpipe your answer back to the original viewpoint/space.

 

For example: the Gaussian (normal) cumulative distribution function is monotone, hence injective (one-to-one), hence invertible.

image

By contrast the Gaussian probability distribution function (the “default” way of looking at a “normal Bell Curve”) fails the horizontal line test, hence is many-to-one, hence cannot be totally inverted.

image

So in this case, integrating once ∫[pdf] = cdf made the function “mathematically nicer” without changing its interesting qualities or altering its inherent nature.

 

Or here’s an example from calc 101: u-substitution. You’re essentially saying “Instead of solving this integral, how about if I solve a different one which is exactly equivalent?” The →ƒ in the top diagram is the u-substitution itself. The “main verb” is doing the integral. U-substituters avoid doing the hard integral, go the long way, and end up doing something much easier.

http://latex.codecogs.com/gif.latex?%5Cdpi%7B200%7D%20%5Cbg_white%20%5Clarge%20%5Ctext%7BProblem%3A%20integrate%20%7D%20%5Cint%20%7B8x%5E7%20-%206x%5E2%20%5Cover%20x%5E8%20-%202x%5E3%20&plus;%2013587%7D%20%5C%20%5Cmathrm%7Bd%7Dx%20%5C%5C%20%5C%5C%20%5Crule%7B13cm%7D%7B0.4pt%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%5Ctextsc%7BClever%20person%3A%7D%20%5Ctextit%7BHow%20about%20instead%20I%20integrate%7D%20%7D%20%5Cint%20%7B1%20%5Cover%20u%7D%20%5C%20%5Cmathrm%7Bd%7Du%20%5Ctext%7B%20%5Ctextit%7B%3F%7D%7D%20%5C%5C%20%5C%5C%20%5C%5C%20%5Ctext%7B%5Ctextsc%7BQuestion%20asker%3A%7D%20%5Ctextit%7BHuh%3F%7D%7D%20%5C%5C%20%5C%5C%20%5C%5C%20%5Ctext%7B%5Ctextsc%7BClever%20person%3A%7D%20%5Ctextit%7BThey%27re%20equivalent%2C%20you%20see%3F%20Watch%21%7D%20%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%5Csmall%7B%28applies%20basis%20isomorphism%20%7D%7D%20%5Cphi%3A%20x%20%5Cmapsto%20u%20%5C%5C%20%5Ctext%7B%5Csmall%7B%20as%20well%20as%20chain%20rule%20for%20%7D%7D%20%5Cmathrm%7Bd%7D%20%5Ccirc%20%5Cphi%3A%20%5Cmathrm%7Bd%7Dx%20%5Cmapsto%20%5Cmathrm%7Bd%7Du%20%5Ctext%7B%5Csmall%7B%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28gets%20easier%20integral%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28does%20easier%20integral%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28laughs%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28transforms%20it%20back%20%7D%7D%20%5Cphi%5E%7B-1%7D%3A%20u%20%5Cmapsto%20x%20%5Ctext%7B%5Csmall%7B%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28laughs%20again%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%5Ctextsc%7BQuestion%20asker%3A%7D%20%5Ctextit%7BUm.%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Csmall%7B%28thinks%29%7D%7D%20%5C%5C%20%5C%5C%20%5Ctext%7B%20%5Ctextit%7BUnbelievable.%20That%20worked.%20You%20must%20be%20some%20kind%20of%20clever%20person.%7D%7D

 

Or in physics—like tensors and Schrödinger solving and stuff.
|3,2,1>+|3,1,-1> Orbital Animation
Physicists look for substitutions that make the computation they have to do more tractable. Try solving a Schrödinger PDE for hydrogen’s first electron in xyz coordinates (square grid)—then try solving it in spherical coordinates (longitude & latitude on expanding shells). Since the natural symmetry of the orbital is spherical, changing basis to polar coords makes life much easier.

polar coordinates "at sea" versus rectangular coordinates "in the city"

 

Likewise one of the goals of tensor analysis is to not be tied to any particular basis—so long as the basis doesn’t trip over itself, you should be free to switch between bases to get different jobs done. Terry Tao talks about something like this under the keyword “spending symmetry”—if you use up your basis isomorphism, you need to give it back before you can use it again.

"Going the long way" can be easier than trying to solve a problem directly.




Saying derivative is “slope” is a nice pedant’s lie, like the Bohr atom

image

which misses out on a deeper and more interesting later viewpoint:

|6,4,1> Orbital Animation|3,2,1>+|3,1,-1> Orbital Animation

 

The “slope” viewpoint—and what underlies it: the “charts” or “plots” view of functions as ƒ(x)–vs–x—like training wheels, eventually need to come off. The “slope” metaphor fails

  • for pushforwards,
  • on surfaces,
    image 
  • on curves γ that double back on themselves
    image 
  • my vignettes about integrals,
  • and, in my opinion, it’s harder to “see” derivatives or calculus in a statistical or business application, if you think of “derivative = slope”. Since you’re presented with reams of numbers rather than pictures of ƒ(x)–vs–x, where is the “slope” there?

"Really" it’s all about diff’s. Derivatives are differences (just zoomed in…this is what lim ∆x↓0 was for) and that viewpoint works, I think, everywhere.

I half-heartedly tried making the following illustrations in R with the barcode package but they came out ugly. Even uglier than my handwriting—so now enjoy the treat of my ugly handwriting.

 

Step back to Descartes definition of a function. It’s an association between two sets.

image

And the language we use sounds “backwards” to that of English. If I say “associate a temperature number to every point over the USA”

US temperatures

then that should be written as a function ƒ: surface → temp.,

(or we could say ƒ: ℝ²→ℝ with ℝ²=(lat,long) )

The \to arrow and the "maps to" phrasing are backwards of the way we speak.

  • "Assign a temperature to the surface" —versus— "Map each surface point to a temperature element from the set of possible temperatures”.

a function is an association between sets

{elf, book, Kraken, 4^π^e} … no, I’m not sure where that came from either. But I think we can agree that such a set is unstructured.

Cartesian function from non-space to weird space

Great. I drew above a set “without other structure" as the source (domain) and a branched, partially ordered weirdy thing as the target (codomain). Now it’s possible with some work to come up with a calculus like the infinitesimal one on ℝ→ℝ functions that’s taught to many 19-year-olds, but that takes more work. But for right now my point is to make that look ridiculous and impossible. Newton’s calculus is something we do only with a specific kind of Cartesian mapping: where both the from and the to have Euclidean concepts of straight-line-ness and distance has the usual meaning from maths class. In other words the Newtonian derivative applies only to smooth mappings from ℝ to ℝ.

 

Let’s stop there and think about examples of mappings.

(Not from the real world—I’ll do another post on examples of functions from the real world. For now just accept that numbers describe the world and let’s consider abstractly some mappings that associate, not arbitrarily but in a describable pattern, some numbers to other numbers.)

successor function and square function

sin function

(I didn’t have a calculator at the time but the circle values for [1,2,3,4,5,6,7] are [57°,114°,172°,229°,286°,344°,401°=41°].)

I want to contrast the “map upwards” pictures to both the Cartesian pictures for structure-less sets

image

and to the normal graphical picture of a “chart” or “plot”.

image

image

Notice what’s obscured and what’s emphasised in each of the picture types. The plots certainly look better—but we lose the Cartesian sense that the “vertical” axis is no more vertical than is the horizontal. Both ℝ’s in ƒ: ℝ→ℝ are just the same as the other.

And if I want to compose mappings? As in the parabola picture above (first the square function, then an affine recentering). I can only show the end result of g∘ƒ rather than the intermediate result.

image

Whereas I could line up a long vertical of successive transformations (like one might do in Excel except that would be column-wise to the right) and see the results of each “input-output program”.

(Además, I have a languishing draft post called “How I Got to Gobbledegook” which shows how much simpler a sequence of transforms can be rather than “a forbidding formula from a textbook”.)

Another weakness of the “charts” approach is that whereas "Stay the same" command ought to be the simplest one (it’s a null command), it gets mapped to the 45˚ line:

image

Here’s the familiar parabola / plot “my way”: with the numbers written out so as to equalise the target space and the source space.

Parabola with the domain and codomain on the same footing.

 

Now the “new” tool is in hand let’s go back to the calculus. Now I’m going to say "derivative=pulse" and that’s the main point of this essay.

linear approximations (differentials) of a parabola (x&sup2;)

Considering both the source ℝ→ and the target →ℝ on the same footing, I’ll call the length of the arrows the “mapping strength”. In a convex mapping like square the diffs are going to increase as you go to the right.

image

OK now in the middle of the piece, here is the main point I want to make about derivatives and calculus and how looking at numbers written on the paper rather than plots makes understanding a push forward possible. And, in my opinion, since in business the gigantic databases of numbers are commoner than charts making themselves, and in life we just experience stimuli rather than someone making a chart to explain it to us, this perspective is the more practical one.

differences on a scalar field (California)

I’m deliberately alliding the concepts of diff as

  • difference
  • R's diff function
  • differential (as in differential calculus or as in linear approximation)
because they’re all related.
differentials on a surface (Where is the Slope?)
a U-neighbourhood of Los Angeles
In my example of an open set around Los Angeles, a surface diff could be you measure the temperature on your rooftop in Los Feliz, and then measure the temperature down the block. Or across the city. Or, if you want to be infinitesimal and truly calculus-ish about it, the difference between the temperature of one fraction of an atom in your room and its nearby neighbour. (How could that be coherent? There are ways, but let’s just stick with the cross-city differential and pretend you could zoom in for more detail if you liked.)
 

Linear

I’m still not quite done with the “my style of pictures” because there’s another insight you can get from writing these mappings as a bar code rather than as a “chart”. Indeed, this is exactly what a rug plot does when it shows histograms.

a rug plot or carpet plot is like a barcode on the bottom of your plot to show the marginal (one-dimension only) distribution of data

Here are some strip plots = rug plots = carpet plots = barcode plots of nonlinear functions for comparison.

 image

image

The main conclusion of calculus is that nonlinear functions can be approximated by linear functions. The approximation only works “locally” at small scales, but still if you’re engineering the screws holding a plane together, it’s nice to know that you can just use a multiple (linear function) rather than some complicated nonlineary thingie to estimate how much the screws are going to shake and come loose.

For me, at least, way too many years of solving y=mx+b obscured the fact that linear functions are just multiples. You take the space and stretch or shrink it by a constant multiple. Like converting a currency: take pesos, divide by 8, get dollars. The multiple doesn’t change if you have 10,000 pesos or 10,000,000 pesos, it’s still the same conversion rate.

image

image

linear maps as multiplication

linear mappings -- notice they're ALL straight lines through the origin!

the flip function

So in a neighborhood or locality a linear approximation is enough. That means that a collection of linear functions can approximate a nonlinear one to arbitrary precision.

building up a nonlinear function from linear parts

That means we can use computers!

Calculus says Smooth functions can be approximatedaround a local neighborhood of a pointwith straight lines

 

Square

I can’t use the example of self times self so many times without exploring the concept a bit. Squares to me seem so limited and boring. No squizzles, no funky shapes, just boring chalkboard and rulers.

But that’s probably too judgmental.

image

recursive "Square" function

After all there’s something self-referential and almost recursive about repeated applications of the square function. And it serves as the basis for Euclidean distance (and standard deviation formula) via the Pythagorean theorem.

How those two are connected is a mystery I still haven’t wrapped my head around. But a cool connection I have come to understand is that between:

  • a variety of inverse square laws in Nature
  • a curve that is equidistant from a point and a line
  • and the area of a rectangle which has both sides equal.

inverse square laws

what does self times self have to do with the geometric figure of a parabola?

parabola

I guess first of all one has to appreciate that “parabola” shouldn’t necessarily have anything to do with x•x. Hopefully that’s become more obvious if you read the sections above where I point out that the target ℝ isn’t any more “vertical” than is the source ℝ.

image

The inverse-square laws show up everywhere because our universe is 3-dimensional. The surface of a 3-dimensional ball (like an expanding wave of gravitons, or an expanding wave of photons, or an expanding wave of sound waves) is 2-dimensional, which means that whatever “force” or “energy” is “painted on” the surface, will drop off as the square rate (surface area) when the radius increases at a constant rate. Oh. Thanks, Universe, for being 3-dimensional.

inverse square laws  why, why, why, WHY?!?!

What’s most amazing about the parabola—gravity connection is that it’s a metaphor that spans across both space and time. The curvature that looks like a-plane-figure-equidistant-to-a-line-and-a-point is curving in time.




a smooth field of 1-vectors in 3-D

a smooth field of 1-vectors in 3-D

(Source: thievess)


hi-res