Posts tagged with **calculus**

So, you never went to university…or you assiduously avoided all maths whilst at university…or you started but were frightened away by the epsilons and deltas…. But you know the calculus is one of the pinnacles of human thought, and it would be nice to know just a bit of what they’re talking about……

Both thorough and brief intro-to-calculus lectures can be found online. I think I can explain **differentiation** and **integration**—the two famous operations of calculus—even more briefly.

Let’s talk about **sequences of numbers.** Sequences that make sense next to each other, like your child’s height at different ages

not just an unrelated assemblage of numbers which happen to be beside each other. If you have handy a sequence of numbers that’s relevant to you, that’s great.

**Differentiation and integration** are two ways of transforming the sequence to see it differently-but-more-or-less-equivalently.

Consider the sequence **1, 2, 3, 4, 5**. If I look at the differences I could rewrite this sequence as `[starting point of 1]`

**, +1, +1, +1, +1**. All I did was look at the difference between each number in the sequence and its neighbour. If I did the same thing to the sequence **1, 4, 9, 16, 25**, the differences would be `[starting point of 1]`

**, +3, +5, +7, +9**.

That’s the derivative operation. It’s basically first-differencing, except in real calculus you would have an infinite, continuous thickness of data—as many numbers between 1, 4, and 9 as you want. In **R** you can use the `diff`

operation on a sequence of related data to automate what I did above. For example do

A couple of things you may notice:

- I could have started at a different starting point and talked about a sequence with the same changes, changing from a different initial value. For example
**5, 6, 7, 8, 9**does the same**+1, +1, +1, +1**but starts at**5**. - I could second-difference the numbers, differencing the first-differences:
**+3, +5, +7, +9**(the differences in the sequence of square numbers) gets me**++2, ++2, ++2**. - I could third-difference the numbers, differencing the second-differences:
**+++0, +++0**. - Every time I
`diff`

I lose one of the observations. This isn’t a problem in the infinitary version although sometimes even infinitely-thick sequences can only be differentiated a few times, for other reasons.

The other famous tool for looking differently at a sequence is to look at cumulative sums: **cumsum** in **R**. This is **integration**. Looking at “total so far” in the sequence.

Consider again the sequence **1, 2, 3, 4, 5**. If I added up the “total so far” at each point I would get **1, 3, 6, 10, 15**. This is telling me the same information – just in a different way. The **fundamental theorem of calculus** says that if I `diff( cumsum( 1:5 ))`

I will get back to **+1, +2, +3, +4, +5**. You can verify this without a calculator by subtracting neighbours—looking at differences—amongst **1, 3, 6, 10, 15**. (Go ahead, try it; I’ll wait.)

Let’s look back at the square sequence **1, 4, 9, 25, 36**. If I cumulatively sum I’d have **1, 5, 15, 40, 76**. Pick any sequence of numbers that’s relevant to you and do `cumsum`

and `diff`

on it as many times as you like.

Those are the basics.

**Why are people so interested in this stuff?**

Why is it useful? Why did it make such a splash and why is it considered to be in the canon of human progress? Here are a few reasons:

- If the difference in a sequence goes from
**+, +, +, +, …**to**−, −, −, −, …**, then the numbers climbed a hill and started going back down. In other words the sequence reached a maximum. We like to maximize things, like efficiency, profit, - A corresponding statement could be made for valley-bottoms. We like to minimise things like cost, waste, usage of valuable materials, etc.
- The
`diff`

verb takes you from position → velocity → acceleration, so this mathematics relates fundamental stuff in physics. - The
`cumsum`

verb takes you from acceleration → velocity → position, which allows you to calculate stuff like work. Therefore you can pre-plan for example what would be the energy cost to do something in a large scale that’s too costly to just try it. - What’s the difference between
**income**and**wealth**? Well if you define`net income`

to be what you earn less what you spend,

then`wealth = cumsum(net income)`

and`net income = diff(wealth)`

. Another everyday relationship made absolutely crystal clear. - In higher-dimensional or more-abstract versions of the fundamental theorem of calculus, you find out that, sometimes, complicated questions like the sum of forces a paramecium experiences all along a sequential curved path, can be reduced to merely the start and finish (i.e., the complicatedness may be one dimension less than what you thought).
- Further-abstracted versions also allow you to optimise surfaces (including “surfaces” in phase-space) and therefore build bridges or do rocket-science.
- With the fluidity that comes with being able to
`diff`

and`cumsum`

, you can do statistics on continuous variables like height or angle, rather than just on count variables like number of people satisfying condition X. - At small enough scales, calculus (specifically Taylor’s theorem) tells you that "most" nonlinear functions can be linearised: i.e., approximated by repeated addition of a constant
`+const+const+const+const+const+...`

. That’s just about the simplest mathematical operation I can think of. It’s nice to be able to talk at least locally about a complicated phenomenon in such simple terms. - In the infinitary version, symbolic formulae
`diff`

and`cumsum`

to other symbolic formulae. For example`diff( x² ) = 2x`

(look back at the square sequence above if you didn’t notice this the first time). This means instead of having to try (or make your computer try) a lot of stuff to see what’s going to work, you can just-plain-understand something. - Also because of the symbolic nicety: post-calculus, if you only know how, e.g.,
`diff( diff( diff( x )))`

relates to`x`

– but don’t know a formula for`x`

itself – you’re not totally up a creek. You can use calculus tools to make relationships between varying`diff`

levels of a sequence, just as good as a normal formula – thus expanding the landscape of things you can mathematise and solve. - In fact
`diff( diff( x )) = − x`

is the source of this, this

, this,

, and therefore the physical properties of all materials (hardness, conductivity, density, why is the sky blue, etc) – which derive from chemistry which derives from Schrödinger’s Equation, which is solved by the “harmonic”`diff( diff( x )) = − x`

.

Calculus isn’t “the end” of mathematics. It’s barely even before or after other mathematical stuff you may be familiar with. For example it doesn’t come “after” trigonometry, although the two do relate to each other if you’re familiar with both. You could apply the “differencing” idea to groups, topology, imaginary numbers, or other things. Calculus is just a tool for looking at the same thing in a different way.

Transgressing boundaries, smashing binaries, and **queering categories **are important goals within certain schools of thought.

Reading such stuff the other week-end I noticed (a) a heap of geometrical metaphors and (b) limited geometrical vocabulary.

What I dislike about the word #liminality http://t.co/uWCczGDiDj : it suggests the ∩ is small or temporary.

— isomorphismes (@isomorphisms)

In my opinion **functional analysis** (as in, precision about mathematical functions—not practical deconstruction) points toward more appropriate geometries than just the `[0,1]`

of fuzzy logic. If your goal is to escape “either/or” then I don’t think you’ve escaped very much if you just make room for an “in between”.

By contrast `ℝ→ℝ`

functions (even continuous ones; even smooth ones!) can wiggle out of definitions you might naïvely try to impose on them. The space of functions naturally lends itself to different metrics that are appropriate for different purposes, rather than “one right answer”. And even trying to define a rational means of categorising things requires a lot—like, Terence Tao level—of hard thinking.

I’ll illustrate my point with the arbitrary function ƒ pictured at the top of this post. Suppose that ƒ∈𝒞². So it does make sense to talk about whether ƒ′′≷0.

But in the case I drew above, **ƒ′′≹0**. In fact “most” 𝒞² functions on that same interval wouldn’t fully fit into either “concave" or "convex”.

So “fits the binary” is rarer than “doesn’t fit the binary”. The “borderlands” are bigger than the staked-out lands. And it would be very strange to even think about trying to shoehorn generic 𝒞² functions into

- one type,
- the other,
- or “something in between”.

Beyond “false dichotomy”, ≶ in this space doesn’t even pass the scoff test. I wouldn’t want to call the ƒ I drew a “queer function”, but I wonder if a geometry like this isn’t more what queer theorists want than something as evanescent as “liminal”, something as thin as "boundary".

hi-res

`Cylinder = line-segment × disc`

`C = | × ●`

The “product rule” from calculus works as well with the boundary operator `∂`

as with the differentiation operator `∂`

.

`∂C = ∂| × ● + | × ∂●`

Oops. Typo. Sorry, I did this really late at night! `cos`

and `sin`

need to be swapped.

Oops. Another typo. Wrong formula for circumference.

I was re-reading Michael Murray’s explanation of cointegration:

and marvelling at the calculus.

Calculus blows my mind sometimes. Like, hey guess how much we can do with subtraction.

— protëa(@isomorphisms) March 28, 2013

Of course it’s not *any* subtraction. It’s subtracting a function from a shifted version of itself. Still doesn’t sound like a universal revolution.

(But of course the observation that the lagged first-difference will be zero around an extremum (turning point), along with symbolic formulæ for (infinitesimal) first-differences of a function, made a decent splash.)

Jeff Ryan wrote some R functions that make it easy to first-difference financial time series.

Here’s how to do the first differences of Goldman Sachs’ share price:

require(quantmod) getSymbols("GS") gs <- Ad(GS) plot( gs - lag(gs) )

Look how much more structured the result is! Now all of the numbers are within a fairly narrow band. With `length(gs)`

I found 1570 observations. Here are 1570 random normals `plot(rnorm(1570, sd=10), type="l")`

for comparison:

Not perfectly similar, but very close!

Looking at the first differences compared to a Gaussian brings out what’s different between public equity markets and a random walk. What sticks out to me is the vol leaping up aperiodically in the $GS time series.

I think I got even a little closer with drawing the stdev’s from a Poisson process `plot(rnorm(1570, sd=rpois(1570, lambda=5)), type="l")`

but I’ll end there with the graphical futzing.

What’s really amazing to me is how much difference a subtraction makes.

differential topology lecture by John W. Milnor from the 1960’s: *Topology from the Differentiable Viewpoint*

- A function that’s problematic for analytic continuations:
- Definitions of smooth manifold, diffeomorphism, category of smooth manifolds
- bicontinuity condition
- two Euclidean spaces are diffeomorphic iff they have the same dimension
- torus ≠ sphere but compact manifolds are equivalence-classable by genus
- Moebius band is not compact
**Four categories of topology**, which were at first thought to be the same, but by the 60’s seen to be really different (and the maps that keep you within the same category):

diffeomorphisms on smooth manifolds;

piecewise-linear maps on simplicial complexes;

homeomorphisms on sets (point-set topology)- Those three examples of categories helped understand category and functor in general. You could work for your whole career in one category—for example if you work on fluid dynamics, you’re doing fundamentally different stuff than computer scientists on type theory—and this would filter through to your vocabulary and the assumptions you take for granted. Eg “maps” might mean “smooth bicontinuous maps” in fluid dynamics but non-surjective, discontinuous maps are possible all the time in logic or theoretical comptuer science. Functor being the comparison between the different subjects.
- The fourth, homotopy theory, was invented in the 1930’s because topology itself was too hard.

- Minute 38-40. A pretty slick proof. I often have a hard time following, but this is an exception.
- Minute 43. He misspeaks! In defining the hypercube.
- Minute 47. Homology groups relate the category of topological-spaces-with-homotopy-classes-of-mappings, to the category of groups-with-homomorphisms.

That’s the first of three lectures. Also Milnor’s thoughts almost half a century later on how differential topology had evolved since the lectures:

Hat tip to david a edwards.

What I really loved about this talk was the categorical perspective. The talks are really structured so that three categories — smooth things, piecewise things, and points/sets — are developed in parallel. Better than development of the theory of categories in the abstract, I like having these specific examples of categories and how “sameness" differs from category to category.

(Source: simonsfoundation.org)

**going the long way**

*What does it mean when mathematicians talk about a bijection or homomorphism?*

Imagine you want to get from `X`

to `X′`

but you don’t know how. Then you find a "different way of looking at the same thing" using ƒ. (Map the stuff with ƒ to another space `Y`

, then do something else over in `image ƒ`

, then take a journey over there, and then return back with ƒ ⁻¹.)

The fact that a bijection can show you something in a new way that suddenly makes the answer to the question so obvious, is the basis of the jokes on www.theproofistrivial.com.

In a given category the homomorphisms `Hom`

∋ ƒ preserve all the interesting properties. Linear maps, for example (except when `det=0`

) barely change anything—like if your government suddenly added another zero to the end of all currency denominations, just a rescaling—so they preserve most interesting properties and therefore any linear mapping to another domain could be inverted back so anything you discover over in the new domain (`image of ƒ`

) can be used on the original problem.

All of these fancy-sounding maps are linear:

- Fourier transform
- Laplace transform
- taking the derivative
- Box-Müller

They sound fancy because whilst they leave things technically equivalent in an objective sense, the result looks very different to people. So then we get to use intuition or insight that only works in say the spectral domain, and still technically be working on the same original problem.

Pipe the problem somewhere else, look at it from another angle, solve it there, unpipe your answer back to the original viewpoint/space.

For example: the Gaussian (normal) cumulative distribution function is monotone, hence injective (one-to-one), hence invertible.

By contrast the Gaussian probability distribution function (the “default” way of looking at a “normal Bell Curve”) fails the horizontal line test, hence is many-to-one, hence cannot be totally inverted.

So in this case, integrating once `∫[pdf] = cdf`

made the function “mathematically nicer” without changing its interesting qualities or altering its inherent nature.

Or here’s an example from calc 101: **u-substitution**. You’re essentially saying “Instead of solving this integral, how about if I solve a different one which is exactly equivalent?” The `→ƒ`

in the top diagram is the u-substitution itself. The “main verb” is doing the integral. U-substituters avoid doing the hard integral, go the long way, and end up doing something much easier.

Or in physics—like tensors and Schrödinger solving and stuff.

Physicists look for substitutions that make the computation they *have* to do more tractable. Try solving a Schrödinger PDE for hydrogen’s first electron `s¹`

in `xyz`

coordinates (square grid)—then try solving it in spherical coordinates (longitude & latitude on expanding shells). Since the natural symmetry of the `s¹`

orbital is spherical, changing basis to polar coords makes life much easier.

Likewise one of the goals of tensor analysis is to not be tied to any particular basis—so long as the basis doesn’t trip over itself, you should be free to switch between bases to get different jobs done. Terry Tao talks about something like this under the keyword “spending symmetry”—if you use up your basis isomorphism, you need to give it back before you can use it again.

"Going the long way" can be easier than trying to solve a problem directly.

Saying derivative is “slope” is a nice pedant’s lie, like the Bohr atom

which misses out on a deeper and more interesting later viewpoint:

The “slope” viewpoint—and what underlies it: the “charts” or “plots” view of functions as `ƒ(x)–vs–x`

—like training wheels, eventually need to come off. The “slope” metaphor fails

- for pushforwards,
- on surfaces,

- on curves γ that double back on themselves

- my vignettes about integrals,
- and, in my opinion, it’s harder to “see” derivatives or calculus in a statistical or business application, if you think of “derivative = slope”. Since you’re presented with reams of numbers rather than pictures of
`ƒ(x)–vs–x`

, where is the “slope” there?

"Really" it’s all about **diff’s**. Derivatives are differences (just zoomed in…this is what `lim ∆x↓0`

was for) and *that* viewpoint works, I think, everywhere.

I half-heartedly tried making the following illustrations in R with the barcode package but they came out ugly. Even uglier than my handwriting—so now enjoy the treat of my ugly handwriting.

Step back to **Descartes’** definition of a function. It’s an **association between two sets.**

And the language we use sounds “backwards” to that of English. If I say “associate a temperature number *to* every point over the USA”

then that should be written as a function `ƒ: surface → temp`

.,

(or we could say `ƒ: ℝ²→ℝ`

with `ℝ²=(lat,long)`

)

The `\to`

arrow and the *"maps to"* phrasing are backwards of the way we speak.

- "Assign a temperature to the surface" —versus— "Map each surface point to a temperature element from the set of possible temperatures”.

`{elf, book, Kraken, 4^π^e}`

… no, I’m not sure where that came from either. But I think we can agree that such a set is unstructured.

Great. I drew above a set “without other structure" as the source (domain) and a branched, partially ordered weirdy thing as the target (codomain). Now it’s possible with some work to come up with a calculus like the infinitesimal one on ℝ→ℝ functions that’s taught to many 19-year-olds, but that takes more work. But for right now my point is to make that look ridiculous and impossible. Newton’s calculus is something we do only with a specific kind of Cartesian mapping: where both the `from`

and the `to`

have Euclidean concepts of straight-line-ness and distance has the usual meaning from maths class. In other words the Newtonian derivative applies only to smooth mappings from ℝ to ℝ.

Let’s stop there and think about **examples of mappings.**

(Not from the real world—I’ll do another post on examples of functions from the real world. For now just accept that numbers describe the world and let’s consider abstractly some mappings that associate, not arbitrarily but in a describable pattern, some numbers to other numbers.)

(I didn’t have a calculator at the time but the circle values for `[1,2,3,4,5,6,7]`

are `[57°,114°,172°,229°,286°,344°,401°=41°]`

.)

I want to contrast the “map upwards” pictures to both the Cartesian pictures for structure-less sets

and to the normal graphical picture of a “chart” or “plot”.

Notice what’s obscured and what’s emphasised in each of the picture types. The plots certainly *look* better—but we lose the Cartesian sense that the “vertical” axis is no more vertical than is the horizontal. Both ℝ’s in ƒ: ℝ→ℝ are just the same as the other.

And if I want to compose mappings? As in the parabola picture above (first the `square`

function, then an affine recentering). I can only show the end result of g∘ƒ rather than the intermediate result.

Whereas I could line up a long vertical of successive transformations (like one might do in Excel except that would be column-wise to the right) and see the results of each “input-output program”.

(Además, I have a languishing draft post called “How I Got to Gobbledegook” which shows how much simpler a sequence of transforms can be rather than “a forbidding formula from a textbook”.)

Another weakness of the “charts” approach is that whereas `"Stay the same"`

command ought to be the simplest one (it’s a null command), it gets mapped to the 45˚ line:

Here’s the familiar parabola / `x²`

plot “my way”: with the numbers written out so as to equalise the target space and the source space.

Now the “new” tool is in hand let’s go back to the calculus. Now I’m going to say **"derivative=pulse"** and that’s the main point of this essay.

Considering both the source ℝ→ and the target →ℝ on the same footing, I’ll call the length of the arrows the “mapping strength”. In a convex mapping like `square`

the diffs are going to increase as you go to the right.

OK now in the middle of the piece, here is the main point I want to make about derivatives and calculus and how looking at *numbers* written on the paper rather than *plots* makes understanding a push forward possible. And, in my opinion, since in business the gigantic databases of numbers are commoner than charts making themselves, and in life we just experience stimuli rather than someone making a chart to explain it to us, this perspective is the more practical one.

I’m deliberately alliding the concepts of diff as

- difference
`R`

's`diff`

function- differential (as in differential calculus or as in linear approximation)

**Linear**

I’m still not quite done with the “my style of pictures” because there’s another insight you can get from writing these mappings as a bar code rather than as a “chart”. Indeed, this is exactly what a rug plot does when it shows histograms.

Here are some strip plots = rug plots = carpet plots = barcode plots of nonlinear functions for comparison.

The main conclusion of calculus is that nonlinear functions can be approximated by linear functions. The approximation only works “locally” at small scales, but still if you’re engineering the screws holding a plane together, it’s nice to know that you can just use a multiple (linear function) rather than some complicated nonlineary thingie to estimate how much the screws are going to shake and come loose.

For me, at least, way too many years of solving `y=mx+b`

obscured the fact that *linear functions are just multiples*. You take the space and stretch or shrink it by a constant multiple. Like converting a currency: take pesos, divide by 8, get dollars. The multiple doesn’t change if you have 10,000 pesos or 10,000,000 pesos, it’s still the same conversion rate.

So in a neighborhood or locality a linear approximation is enough. That means that a collection of linear functions can approximate a nonlinear one to arbitrary precision.

That means we can use computers!

**Square**

I can’t use the example of `self times self`

so many times without exploring the concept a bit. Squares to me seem so limited and boring. No squizzles, no funky shapes, just boring chalkboard and rulers.

But that’s probably too judgmental.

After all there’s something self-referential and almost recursive about repeated applications of the `square`

function. And it serves as the basis for Euclidean distance (and standard deviation formula) via the Pythagorean theorem.

How those two are connected is a mystery I still haven’t wrapped my head around. But a cool connection I have come to understand is that between:

- a variety of inverse square laws in Nature
- a curve that is equidistant from a point and a line
- and the area of a rectangle which has both sides equal.

I guess first of all one has to appreciate that “parabola” shouldn’t necessarily have anything to do with `x•x`

. Hopefully that’s become more obvious if you read the sections above where I point out that the target ℝ isn’t any more “vertical” than is the source ℝ.

The inverse-square laws show up everywhere *because our universe is 3-dimensional*. The surface of a 3-dimensional ball (like an expanding wave of gravitons, or an expanding wave of photons, or an expanding wave of sound waves) is 2-dimensional, which means that whatever “force” or “energy” is “painted on” the surface, will drop off as the square rate (surface area) when the radius increases at a constant rate. Oh. Thanks, Universe, for being 3-dimensional.

What’s most amazing about the parabola—gravity connection is that it’s a metaphor that spans across both space *and* time. The curvature that looks like a-plane-figure-equidistant-to-a-line-and-a-point is curving *in time*.