Posts tagged with information

Lucas’ “rational expectations” revolution in macroeconomics has been tied to the ending of stagflation in the world’s largest economy, and to the reintroduction of “psychology” into finance and economics. However, I never felt like the models of “expectation” I’ve seen in economics seem like my own personal experience of living in ignorance. I’d like to share the sketch of an idea that feels more lifelike to me.


First, let me disambiguate: the unfortunate term-overlap with “statistical expectation” (= mean = average = total over count = ∑ᵢᴺ•/N = a map from N dimensions to 1 dimension) indicates nothing psychological whatever. It doesn’t even correspond to “What you should expect”.

If I find out someone is a white non-Hispanic Estadounidense (somehow not getting any hints of which state, which race, which accent, which social class, which career track…so it’s an artificial scenario), I shouldn’t “expect” the family to be worth $630,000. I “expect” (if indeed my expectation is not a distribution but rather just one number) them to be worth $155,000. (scroll down to green)

Nor, if I go to a casino with 99% chance of losing €10,000 and 1% chance of winning €1,000,000 (remember the break-even point is €990,000). “On average” this is a great bet. But that ignores convergence to the average, which would be slow. I’d need to play this game a lot to get the statistics working in my favour, and I mightn’t stay solvent (I’d need to get tens of millions of AUM—with lockdown conditions—to even consider this game). No, the “statistical expectation” refers to a long-run or wide-space convergence number. Not “what’s typical”.

Not only is the statistical expectation quite reductive, it doesn’t resemble what I’ve introspected about uncertainty, information, disinformation, beliefs, and expectations in my life.

File:Coloured Voronoi 3D slice.svg

A better idea, I think, comes from the definition of Riemann integration over 2+ dimensions. Imagine covering a surface with a coarse mesh. The mesh partitions the surface. A scalar is assigned to each of the interior regions inscribed by the mesh. The mesh is then refined (no lines taken away, only some more added—so some regions get smaller/more precise and no regions get larger/less precise), new scalars are computed with more precise information about the scalar field on the surface.
a scalar field

NB: The usual Expectation operator 𝔼 is little more than an integral over “possibilities” (whatever that means!).

(In the definitions of Riemann integral I’ve seen the mesh is square, but Voronoi pictures look awesomer & more suggestive of topological generality. Plus I’m not going to be talking about infinitary convergence—no one ever becomes fully knowledgeable of everything—so why do I need the convenience of squares?)

I want to make two changes to the Riemannian-integral mesh.



First I’d like to replace the scalars with some more general kind of fibre. Let’s say a bundle of words and associations.

(You can tell a lot about someone’s perspective fro the words they use. I’ll have to link up “Obverse Words”, which has been in my drafts folder for over a year, once I finish it—but you can imagine examples of people using words with opposite connotation to denote the same thing, indicating their attitude toward the thing.)


Second, I’d like to use the topology or covering maps to encode the ignorance somehow. In my example below: at a certain point I knew “Rails goes with Ruby” and “Django goes with Python” and “Git goes with Github” but didn’t really understand the lay of the land. I didn’t know about git’s competitors, that you can host your own github, that Github has competitors, the more complex relationship between ruby and python (it’s not just two disjoint sets), and so on.

When I didn’t know about Economics or Business or Accounting or Finance, I classed them all together. But now they’re so clearly very very different. I don’t even see Historical Economists or Bayesian Econometricians or Instrumental Econometricians or Dynamical Macroeconomists or Monetary Economists or Development Economists as being very alike. (Which must imply that my perspective has narrowed relative to everyone else! Like tattoo artists and yogi masters and poppy farmers must all be quite different to the entire class of Economists—and look even from my words how much coarse generalisation I use to describe the non-econ’s versus refinement among the econ’s.
These meshes can have a negative curvature (with, perhaps a memory) if you like. You know when you think that property actuaries are nothing at all like health actuaries that your frame-of-reference has become very refined among actuary-distinguishment. Which might mean a coarse partitioning of all the other people! Like Bobby Fischer’s use of the term “weakies” for any non-chess player—they must all be the same! Or at least they’re the same to me.)


Besides the natural embedding of negatively-curved judgment grids, here are some more pluses to the “refinement regions” view of ignorance:

  1. You could derive a natural “conservation law” using some combination of e.g. ability, difficulty, how good your teachers are, and time input to learning, how many “refinements” you get to make. No one can know everything.

    (Yet somehow we all are supposed to function in a global economy together—how do we figure out how to fit ourselves together efficiently?

    And what if people use your lack of perspective to suggest you should pay them to teach you something which “evaluates to valuable” from your coarse refinement, but upon closer inspection, doesn’t integrate to valuable?)
  2. Maybe this can relate to the story of Tony—how we’re always in a state of ignorance even as we choose what to become less ignorant about. It would be nice to be able to model the fact that one can’t escape one’s biases or context or history.
  3. And we could get a fairly nice representation of “incompatible perspectives”. If the topology of your covering maps is “very hard” to match up to mine because you speak dialectics and power structures but I speak equilibria and optima, that sounds like an accurate depiction. Or when you talk to someone who’s just so noobish in something you’re so expert in, it can feel like a very blanket statement over so many refinements that you don’t want to generalise over (and from “looking up to” an expert it can also feel like they “see” much more detail of the interesting landscape.)
  4. Ignorance of one’s own ignorance is already baked into the pie! As is the beginner’s luck. If I “integrate over the regions” to get my expected value of a certain coarse region, my uninformed answer may have a lot of correctness to it. At the same time, the topological restrictions mean that my information and my perspective on it aren’t “over there” in some L2-distance sense, rather they’re far away in a more appropriately incompatible-with-others sense.

In conclusion, I’m sure everyone on Earth can agree that this is a Really Nifty and Cool Idea.



I’ll try to give a colourful example using computers and internet stuff since that’s an area I’ve learned a lot more about over the past couple years.

A tiny portion of Doug Hofstadter’s “semantic network”.  via jewcrew728, structure of entropy

First, what does ignorance sound like?

  • (someone who has never seen or interacted with a computer—let’s say from a non-technological society or a non-computery elderly rich person. I’ve never personally seen this)
  • "Sure, programming, I know a little about that. A little HMTL, sure!”
  • "Well, of course any programming you’re going to be doing, whether it’s for mobile or desktop, is going to use HTML. The question is how.

OK, but I wasn’t that bad. In workplaces I’ve been the person to ask about computers. I even briefly worked in I.T. But the distance from “normal people” (no computer knowledge) to me seems very small now compared to the distance between me and people who really know what’s up.

A few years ago, when I started seriously thinking about trying to make some kind of internet company (sorry, I refuse to use the word “startup” because it’s perverted), I considered myself a “power user” of computers. I used keyboard shortcuts, I downloaded and played with lots of programs, I had taken a C++ course in the 90’s, I knew about C:\progra~1 and how to get to the hidden files in the App packages on a Mac.

My knowledge of internet business was a scatty array of:

  • Mark Zuckerberg
  • "venture capital"
  • programer kid internet millionaires
  • Kayak.com — very nice interface!
  • perl.
    Regular Expressions
    11th Grade
  • mIRC
  • TechCrunch
  • There seem to be way more programming going on to impress other programmers than to make the stuff I wanted!
  • I had used Windows, Mac, and Linux (!! Linux! Dang I must be good)
  • I knew that “Java and Javascript are alike the way car and carpet are alike”—but didn’t know a bit of either language.
  • I used Alpine to check my gmail. That’s a lot of confusing settings to configure! And plus I’m checking email in text mode, which is not only faster but also way more cooly nerdy sexy screeny.
  • Object-Oriented, that’s some kind of important thing. Some languages are Object-Oriented and some aren’t.
  • "Python is for science; Ruby is for web"
  • sudo apt-get install
  • I had run at least a few programs from the command line.
  • I had done a PHP tutorial at W3CSchools … that counts as “knowing a little PHP”, right?

So I knew I didn’t know everything, but it was very hard to quantify how much I did know, how far I had to go.


A mediocre picture of some things I knew about at various levels. It’s supposed to get across a more refined knowledge of, for example, econometrics, than of programming. Programming is lumped in with Linux and rich programmer kids and “that kind of stuff” (a coarse mesh). But statistical things have a much richer set of vocabulary and, if I could draw the topology better, refined “personal categories” those words belong to.

Which is why it’s easier to “quantify” my lack of knowledge by simply listing words from the neighbourhood of my state of knowledge.

Unfortunately, knowing how long a project should take and its chances of success or potential pitfalls, is crucial to making an organised plan to complete it. “If you have no port of destination, there is no favourable wind”. (Then again, no adverse wind either. But in an entropic environment—with more ways to screw up than to succeed—turning the Rubik’s cube randomly won’t help you at all. Your “ship” might run out of supplies, or the backers murder you, etc.)


Here are some of the words I learned early on (and many more refinements since then):

  • Rails
  • Django
  • IronPython
  • Jython
  • JSLint
  • MVC
  • Agile
  • STL
  • pointers
  • data structures
  • frameworks
  • SDK’s
  • Apache
  • /etc/.httpd
  • Hadoop
  • regex
  • nginx
  • memcached
  • JVM
  • RVM
  • vi, emacs
  • sed, awk
  • gdb
  • screen
  • tcl/tk, cocoa, gtk, ncurses
  • GPG keys
  • ppa’s
  • lspci
  • decorators
  • virtual functions
  • ~/.bashrc, ~/.bash_profile, ~/.profile
  • echo $SHELL, echo $PATH
  • "scripting languages"
  • "automagically"
  • sprintf
  • xargs
  • strptime, strftime
  • dynamic allocation
  • parser, linker, lexer
  • /env, /usr, /dev,/sbin
  • virtual consoles
  • Xorg
  • cron
  • ssh, X forwarding
  • UDP
  • CNAME, A record
  • LLVM
  • curl.haxx.se
  • the difference between jQuery and JSON (they’re not even the same kind of thing, despite the “J” actually referring to Javascript in both cases)
  • OAuth2
  • XSALT, XPath, XML



This is only—as they say—“the tip of the iceberg”. I didn’t know a ton of server admin stuff. I didn’t understand that libraries and frameworks are super crucial to real-world programming. (Imagine if you “knew English” but had a vocabulary of 1,000 words. Except libraries and frameworks are even better than a large vocabulary because they actually do work for you. You don’t need to “learn all the vocabulary” to use it—just enough words to call the library’s much larger program that, say, writes to the screen, or scrapes from the web, or does machine learning, for you.)

The path should go something like: at first knowing programming languages ⊃ ruby. Then knowing programming languages ⊃ ruby ⊃ rubinius, groovy, JRuby. At some point uncovering topological connections (neighbourhood relationships) to other things (a comparison to node.js; a comparison to perl; a lack of comparability to machine learning; etc.)

I could make some analogies to maths as well. I think there are some identifiable points across some broad range of individuals’ progress in mathematics, such as:

  • when you learn about distributions and realise this is so much better than single numbers!

    a rug plot or carpet plot is like a barcode on the bottom of your plot to show the marginal (one-dimension only) distribution of data

    who is faster, men or women?
  • when you learn about Gaussians and see them everywhere
    Central Limit Theorem  A nice illustration of the Central Limit Theorem by convolution.in R:  Heaviside <- function(x) {      ifelse(x>0,1,0) }HH <- convolve( Heaviside(x), rev(Heaviside(x)),        type = "open"   )HHHH <- convolve(HH, rev(HH),   type = "open"   )HHHHHHHH <- convolve(HHHH, rev(HHHH),   type = "open"   )etc.  What I really like about this dimostrazione is that it’s not a proof, rather an experiment carried out on a computer.  This empiricism is especially cool since the Bell Curve, 80/20 Rule, etc, have become such a religion.NERD NOTE:  Which weapon is better, a 1d10 longsword, or a 2d4 oaken staff? Sometimes the damage is written as 1-10 longsword and 2-8 quarterstaff. However, these ranges disregard the greater likelihood of the quarterstaff scoring 4,5,6 damage than 1,2,7,8. The longsword’s distribution 1d10 ~Uniform[1,10], while 2d4 looks like a Λ.  (To see this another way, think of the combinatorics.)
  • when you learn that Gaussians are not actually everywhere
    kernel density plot of Oxford boys' heights.

    histogram of Oxford boys' heights, drawn with ggplot.A (bimodal) probability distribution with distinct mean, median, and mode.
  • in talking about probability and randomness, you get stuck on discussions of “what is true randomness?” “Does randomness come from quantum mechanics?” and such whilst ignorant of stochastic processes and probability distributions in general.
  • (not saying the more refined understanding is the better place to be!)
  • A brilliant fellow (who now works for Google) was describing his past ignorance to us one time. He remembered the moment he realised “Space could be discrete! Wait, what if spacetime is discrete?!?!?! I am a genius and the first person who has ever thought of this!!!!” Humility often comes with the refinement.
  • when you start understanding symbols like ∫ , ‖•‖, {x | p} — there might be a point at which chalkboards full of multiple integrals look like the pinnacle of mathematical smartness—
  • but then, notice how real mathematicians’ chalkboards in their offices never contain a restatement of Physics 103!
    Kirby topology 2012
    A parsimonious statement like “a local ring is regular iff its  global dimension is finite” is so, so much higher on the maths ladder than a tortuous sequence of u-substitutions.
  • and so on … I’m sure I’ve tipped my hand well enough all over isomorphismes.tumblr.com that those who have a more refined knowledge can place me on the path. (eg it’s clear that I don’t understand sheaves or topoi but I expect they hold some awesome perspectives.) And it’s no judgment because everyone has to go through some “lower” levels to get to “higher” levels. It’s not a race and no one’s born with the infinite knowledge.

I think you’ll agree with me here: the more one learns, the more one finds out how little one knows. One can’t leave one’s context or have knowledge one doesn’t have. And all choices are embedded in this framework.

Robert Sapolsky on Language and schizophrenia

  • importance of FOXP2
  • Take away FOXP2 from mice and they talk less complexly.
  • Give mice our human FOXP2 and they talk more.
  • Humans missing FOXP2 can’t do they no talkin be wrongly.
  • Babel → pidgin → creole
  • all creoles have the same grammar
  • …smells like…one inherent human language???
  • ecological factors: rainforest & biodiverse ecosystems tend to produce polytheistic cultures (more linguistic diversity, “more diversity” in many areas)
  • 90% of Earth’s languages will be extinct in not so long.
  • hunter-gatherers have a higher frequency of click languages
  • "Language is how we outsmart plants" —Steven Pinker
  • language is sequential; toolmaking is sequential
  • cooperation — game theory — kin selection — and, lying.
  • Dogs put the lid on their fear pheromones by tucking their tails.
  • A lot of the brain controls facial expressions. (important if you want to lie)
  • Game theory with communication, with semanticity, with syntax, with grammar — all traits of our language — improve outcomes in the game.

Minute 23 — Schizophrenia

  • Sequential thinking is impaired. (Can’t tell a story in an order that will make sense to others.) (Actually that sounds like me.)
  • Loose associations. (Can’t keep straight within one sentence whether “boxer” refers to dog or occupation. Gold caddy vs Cadillac)
  • (So I guess homophones differ among languages and thus schizophrenics of different languages tangent predictably based on their language?)
  • Difficulties with abstraction. (Fact vs parable vs rumour) Always interpret as concrete reality.
  • "Apple, banana, orange. What do these words have in common?" "They’re all multisyllabic words." "OK, that’s true. Anything else?" "Yes. They all have letters with closed loops." Symbolic function of language not working for them.
  • "What’s on your mind?" "My hair." "Can I take your picture?" "I don’t have a picture to give." "Can you write a sentence for me?" "A sentence for me."
  • Belief that they participated in historical events.
  • "What do apples, oranges, and bananas have in common?" "They’re all wired for sound."
  • Hallucinations. The defining feature.
  • Most hallucinations are auditory but we don’t know why.
  • People experience very structured hallucinations, not random ones. But neurologically it looks random. epsilon;
  • In fact papers have been published about the most common hallucinations. Commonest voices, in order: Jesus, Satan, the political leader.
  • The story of a schizophrenic Maasai.
  • After a really abhorrent violation of social convention, they locked her away and she died. Sound familiar? Oh well, I guess she knew what was coming to her and ∴ tacitly rationally agreed to her punishment, right?
  • Nuopharmacology evolving from trying to cure hallucinations to trying to cure disordered thought.
  • Elderly schizophrenics lose the positive symptoms (hallucinations, delusions, loose associations) and the negative symptoms (flat affect and withdrawal) dominate.
  • Schizophrenia sets on in late adolescence/early adulthood—make it to  30 without it, you’re probably safe.
  • Anchored in the frontal cortex.

(por StanfordUniversity)

189 Plays • Download

Let’s reflect for 11 minutes on specific market, rather than on “free markets” in the abstract, for a real-world perspective on economic theory.

Simple fact that’s apparently obvious to everyone who trades muni bonds but not to me: 1 December is a common payout date, meaning that January & February usually have lower yields as there’s more money (tagged “for investing in bonds”) looking for its next home. So much for optimal selection of the best securities throughout all time or statistical arbitrage of the rates.

Not that there’s not an efficient-markets explanation for this or that Marshallian S&D is irrelevant. But there are some obvious kinks to it, right? You might thumbnail this as “institutions” if you have an economic-theory label gun in your holster.

  • Besides the regular cyclic dependence (I can easily imagine some theorist who doesn’t participate in the market assuming it must “obviously” be arbed away. Some papers on asset switching come to mind),
  • in this market we talk about fixed supply coming onto the market at a given time window, very different than a posted-offer (retail) or 
  • so your econ 101 S&D picture wouldn’t quite capture let’s say $AMZN’s decision of how to list its bonds. There’s some invisible demand curve
    (I drew this for an older post, not gonna re-draw an S&D curve.) that $AMZN is going to try to guess at (and they might hire some “banksters” to help them). Then they can delay or move forward that fixed vertical supply curve (they probably have pre decided how much they want to borrow based on their internal models and decision processes). So there’s a time element (try to list in Jan/Feb, if you “can” wait! Do or don’t list along with this other major issuance. Competing for the analysts’ attention. Etc.) and much less of a price element, since that’s pre-decided before listing.

    Sure, maybe there’ll be an OTC secondary market that changes the price afterwards. But now already we’ve split an adze through the “perfect efficiency” idea, simply by questioning whether it’s the primary auction or secondary trading we’re calling efficient. Now it sounds much more plausible that markets with more volume and smarter participants are going to price more “accurately” (which we haven’t defined and couldn’t, since defaults are one-off probabilities) and simply by questioning the axiom we’ve undermined the theorists’ go-to assumptions. If you wanted to model this market, you might start with something quite different to “Assume fully informed traders and no arbitrage condition”. You might start with, like, actual facts or data on trade records, look at legal documents, record phone calls, ….
  • And what about the ubiquitous government-versus-markets dichotomy? Oh, snap! This is private investors lending small (not federal) governments money to build highways. Hmm, I guess that “four legs good, two legs baaaaad” dichotomy is incoherent.
  • As far as politics goes, GARVEE’s must be a really important political issue (how to arrange for surface transport across the United States), but as it’s not a sex scandal or on either American party’s agenda to trash the other, I guess it’s unfit for discussion in the news.

Any market I look a little closer at, if you squint you can see the EMH in the outlines. But the details are more complicated and much more like a transaction you can imagine real people (who can afford lawyers) engaging in. Very head-to-head, can-we-make-a-deal, size-matters, quantity-over-price, get-it-done-rather-than-optimise-the-exact-details, ….

Equally as much as I might want to show this to overzealous, oversimple free-marketeers, I’d show it to the anti-capitalist zealots too. Does this sound like a monopoly of power by either large corporations or plutocrats? There’s a competitive playing field gunning for however much money is out there, you have to argue your case (marketing) for why the people with investors’ money (let’s say a pension fund) should trust you’ll pay them back, in general a lot of “power” floating around but sounds like they keep each other in check. It’s not like $AMZN can force people to subscribe to its bond issuance. And I can even imagine a legitimate, contributive role for high-powered lawyers and investment “banksters”. Do the computer programmers at $AMZN know how to market and list securities, track down and convince the people safeguarding the big sacks of money to lend it to them? Other than a prejudice toward large size, Doesn’t sound very plutocratic to me.

Anyway. I listened to this and got the feeling I have many times on learning just some basic obvious stuff about real markets. Like wow, grand economic theory is missing details that are obvious to actual market participants, mired in overgeneralisations and simplifications, and the theorist who gets all tooth-and-claw about their holy assumptions needs to get more exposure to the real world.

Just from the merest actual facts about this market regime I’m already thrust into a “middle ground” where, sure, the price system, self-interest, and competition seem like they’re going to be pretty good and robust-over-time and so on. But certainly not “optimal”! And certainly not full Intrade-worshipper style, where price corresponds to exact probabilities and markets are a crystal ball | prediction engine. So the extremists on both ends lose, on this story.

Amazing what you can learn about the world when you actually observe it before writing the theory.

"To generalize is to be an idiot. To particularize alone is a distinction of merit." —William Blake

Hat tip @munilass.

PS Obviously I don’t know anything about munis or corporate bonds. All of my “you“‘s and “I“‘s above can be interpreted in a strict sense as “Now that I’ve learned the very first thing about this real market, how does plausible do various abstract economic-theory ideas sound afterwards?” If you work in these markets and I said something wrong, please correct me.

The classic red/green colouring scheme for trading screens seems too alarmist.





Conceptually, the red/green distinction makes sense as corresponding to stop/go in traffic signals. But traffic signals need to be neon and striking in a hectic 3-D environment where it’s paramount for everyone to definitely not-miss the stop command.

But in a sheltered 2-D environment where goals commonly include to master emotion, to control passive reactivity, to keep a long-term head in the middle of short-term volatility, and to digest (calmly) massive amounts of information en simultáneo, neon red/green seems too grating.

yellow and blue trading screen (GVZ)

I made the above picture with R of course, like this:

    reChart(up.col="light blue", dn.col="yellow")

(GVZ is the gold volatility index.)

It’s not a perfect colour scheme—I would use Lab to do better—but it already improves on #FF0000 versus #00FF00.


One theory of the evolution of trichromacy in primates says that

  • red/green dichotomy tells us whether meat or fruit is rotten or ripe (especially in dappled light)
  • blue/yellow dichotomy tells us how cool/warm something is
  • light/dark (value) is the most basic kind of vision.

If we take that as a starting point, a less alarmist colour scheme for trading software could use the blue/yellow dichotomy to indicate whether a security price went up or down. Use a neutral chroma for “small” moves (this depends upon one’s time-frame, but properly the definition of “big move” should be calibrated to an exponential moving average with some width depending on one’s market telescope). Intensity of the move could be signalled with lightness, so that most figures on a screen are a readable lightness of a neutral colour, but “big moves” are tinged with convexly more chroma and very-convexly more lightness.


The definition of “up/down” might be refigured as whether the trader is short/long the security in question, or perhaps redness/greenness could be used in conjunction with the “market view” of cold/hot, to indicate whether a security is moving for/against one’s strategy. That too could be seen as overly alarming, but a (pseudo)convex coding of red-ness might again solve the problem again, only invoking the “panic mode” when there’s really something to worry about.

(Source: twitter.com)

If you’ve read Stats 101 at your local institution of schooling and refinement, you know the difference between false positives and false negatives.

  • False positive. Oncologist, to patient: Oh my God. This is terrible. Just terrible. Patient: What? Is it bad, Doc? Oncologist: Oh, not you. My son’s handwriting. It’s terrible! Practically illegible.”
  • False negative: Pregnancy test:  Eight months later: Waaaah!

False positive is when the canary has spent several years building up an immunity to iocaine mine gas; you stroll in and die. False negative is when the canary dies of canary-pneumonia in a gas-free mine; you scurry away and miss out on $500bn worth of shale coal.

twitter bird


For algorithmic traders, a “signal” is the switch that tells your software “Buy! Buy!” or “Sell! Sell!”

  • Computer: Just give me the signal, boss, and I’ll buy 10,000 shares of the company that makes IcyHot.
  • Trader: Let’s see … gold is up 50% on the year … the underlying is retracing between the third and fourth Fibonacci levels  … the volatility of the DOW is below 30 … it’s raining in Moscow … and my Alabama state government newsfeed just flashed the word “indubitably” … throw down the iron condor! Hard!!!

If I think about looking for a signal, I think about: when should I do this trade?


The idea behind this exercise is to have a computer search through data streams for you and tell you what’s a good time to trade.

If you take the perspective that the only thing you can control is your bet size (and not what the market will do), then it becomes clear that the choice is not only about {yes, no} but also about [£0, £100bn].

Accordingly, something that tells you when not to trade can be just as valuable as something that tells you when to trade.

The most obvious non-trading scenario is the Federal Open Market Committee. Say you normally trade forex intraday, close out all your positions when you leave the screen, and that’s your game. Right after the FOMC announcement, market movements may be drastic and will have little to do with what you normally bet on — unless you trade FOMC announcements specifically. But the point is it’s a separate modelling problem.

In theory at least, any strategy should be improvable if you can accurately identify conditions when the strategy fails. Removing losers will add to your PnL just as surely as adding winners.

I’ll make up a fake example with fake data (aka, lies). Say your strategy is to trade in the direction of momentum of S&P 500 E-mini’s iff the directionality has been sustained for at least 70% of the last five minutes, and to pull out the trade iff the fraction of price movements in your direction falls below 70%.

Looking at each hour of the past year, the least profitable hour for this trade, statistically, has been 12:00-1:00 New York time. So if you had followed the exact same instructions but closed out all positions and never traded during lunchtime, your PnL during 2011 would have been higher than the strategy as originally stated. (Of course, this example works only because one hour had to be the least profitable. But the same difficulty—distinguishing real patterns from numerical mirages—inheres in signal identification as well as anti-signal identification. If you identify a real cause & effect then the anti-signal should work.)

Any Statistical Model

Say you are trying to calculate when, where, and wieviel an advertiser should bid in a DMP for internet ad space. You take as inputs known or presumed data about site visitors, indexed by cookie, and produce as (eventual) output a list of which ads to buy, when, and how much to bid for them.

Here the same anti-signal concept could apply.

Instead of thinking, What are some damned good characteristics in this space? or Should I try another algorithm? This other paper says RandomForests aren’t as good as Breiman says. , think What data is the AI really failing on? You can remove those data from the training set and decline to make recommendations about cookies within that hull.

Say you are scanning a number of text resumes on a site like Indeed <aff link> and trying to figure out whose application you should invite for a geomodelling job. Just as much as searching for positive keywords like “Petrel”, you might want to filter out negative keywords like “definately”.

Say you are training your machine to learn when tweets will be effective and when not. Instead of shoving every tweet through the lingpipe, first filter out the non-English-language tweets.

OK, that last example is really obvious. I am not claiming that anti-signals are novel. It’s just a word I made up for something that’s common sense. But coining the word reminds me when I look at a modelling problem, to turn the problem upside-down and ask if there’s any low-hanging fruit on the other side.

In the beginning, at the birth of computing, there were no programming languages. Programs looked something like this:
00110001 00000000 00000000
00110001 00000001 00000001
00110011 00000001 00000010
01010001 00001011 00000010
00100010 00000010 00001000
01000011 00000001 00000000
01000001 00000001 00000001
00010000 00000010 00000000
01100010 00000000 00000000

That is a program to add the numbers from one to ten together, and print out the result (1 + 2 + ... + 10 = 55). It could run on a very simple kind of computer.
Marijn Haverbeke

(Source: eloquentjavascript.net)

What I sensed was that while the laws of supply and demand governed everything on earth, the easy money was in demand—manufacturing it, manipulating it, sending it forth to multiply, etc. As a rule of thumb (and with some notable exceptions), the profit margins you could achieve selling a good or service were directly correlated to the total idiocy and/or moral bankruptcy of the demand you drummed up for it.

This was easier to grasp if you were in the business of peddling heroin, Internet stocks, or celebrity gossip; journalists, on the other hand, [did not] understand … their role in this [charade].

In the past, newspapers had made respectable margins selling a non-inane product largely because people had little choice but to herald their sublets and white sales alongside the journalists’ tales of human suffering/corporate corruption/government ineptitude.

The times were prosperous enough that much of the print media even chose to abstain from taking a share of the demand-creation campaigns of liquor and tobacco brands…. Indeed, journalism … was about delivering important information about the world—information … democracy … needed, whether [people] knew it or not.

That journalism’s ability to deliver that information—to fill that need—ultimately depended, to an unsettling degree, on the ability to create artificial demand for a lot of stuff that people didn’t actually need—luxury condos, ergonomically correct airplane seats, the latest celebrity-endorsed scent—was an afterthought at best, at least in the newsroom.

Maureen Tkacik

(Source: cjr.org)

We showed in Chapter 6 that side information Y for the horse race X can be used to increase the growth rate by the mutual information I(X;Y). We now extend this result to the stock market.

Here, I(X;Y) is an upper bound on the increase in the growth rate, with equality if X is a horse race. We first consider the decrease in growth rate incurred by believing in the wrong distribution.

Thomas A. Cover & Joy A. Thomas, Elements of Information Theory

Peter Todd has been misquoted about the mathematics of dating here, here, here (here), here, here, here, herehere, here, here, and in at least five trillion issues of Cosmo. (Surprisingly, this and this did not misquote him.) It’s enough to make me want to write a strongly worded DEAR SIR to the Hearst Tower.

Here is what they say:

  • Only after you’ve dated twelve people, are you ready to decide who’s “The One”!

An even wronger version of the story goes like this:

  • The twelfth guy you date — he’s The One! Science says so! No pressure!!!!!!!

Not only is this wrong, but I’ve heard Peter rant in person, specifically about these misquotations. The problem he studies is known colloquially as "The Search for a Parking Space”.

  1. When you arrive at the movie theatre, you circle around the car park until you see an opening. (Let’s assume it’s below freezing outside.)
  2. When you see that opening, you can immediately tell how far away it is from the theatre. So you know how far you will have to walk in the cold.
  3. At that moment, you have to decide whether to drive on (keep looking for somewhere closer) or accept the probably-imperfect husband — oops, I mean parking space — that you’re staring straight in the face (oops, I mean tarmac).
  4. You can’t back up; you can’t see ahead; all you can do is remember the past, guess about the future, and assess the situation you’re in. That’s all you’ve got to go on. Try to solve that problem optimally.

The paper that’s being referenced (though apparently not read) in these magazines deals with an even stricter problem, known as "The Vizier Wants to Keep His Head":

  1. In this version of the blind forward-search problem, the greedy, vindictive, lazy Prince has to choose a wife.
  2. Being lazy, he tasks the Vizier with solving his problem. Being vindictive, if the Vizier gets it wrong, the Vizier loses his head. Being greedy, the Prince wants the Vizier to find him the wife with the richest dowry.

    (I believe dowry is chosen because it’s seen as a one-dimensional, objectively valuable quantity — as opposed to beauty, which is multifaceted and arguable. If we’re talking about various land holdings, I think dowry would also be multifaceted; that things have a single price is an illusion <link> of simplistic economic thinking.

    Imagine a woman whose family had holdings in modern-day Lebrija, Huelva, Palma del Condado, Aracena, and Ayamonte. Each taxable area will bring in unpredictable revenues year upon year, and the natural beauty of each estate is just as disputatious as a woman’s face. So how is that a one-dimensional value? Oh, well. The point is to assign a scalar to each woman.)

  3. The debutantes enter the Prince’s chamber one at a time; as each enters, a courtier reads her name and family holdings. So the Prince and Vizier assign a scalar to that maiden. Then the Prince either proposes marriage or declines.

  4.  Once an heiress has been declined, the Prince can’t call her back. In other words, even if he thinks to himself: “Crap! B_tch Number 37 had a nice rack and a fabulous estate in Milano. I should have gone with her!”, that’s just too bad. Even a handsome, powerful, jerk of a Prince can’t un-dump a ladyfriend.

  5. So the Vizier is set up a similar, but more constrained, problem to the Car Park Dilemma. Except the Prince can’t circle around the way a driver could.

  6. Also, this is important: exactly one-hundred dames will appear before the prince. The solution changes if an infinite progression of dames (or even just all the singles in your greater metropolitan region of choice) paraded before him.

  7. If a richer girl is to be found among either the post-wife sequence of the pre-wife sequence of heiresses, off with the Vizier’s head. 

Given that problem: pick the highest scalar from a forward-blind, one-by-one sequence of scalars, the Vizier maximises his probability of living past the ritual (to something like 30%) with the following strategy:

  1. Observe the wealth / beauty / scalar value of the first 12 women.
  2. Whatever is the highest wealth / beauty / scalar out of that group, becomes your “aspiration level” A.
  3. As soon as you see an heiress with wealth/beauty/scalar ≥A, tell the Prince to marry her.

Again, that strategy doesn’t make the Vizier win (i.e., it doesn’t make you pick the perfect boyfriend every time); it merely maximises the chances of maximisation, within this narrowly specified problem.

So here are the reasons the magazines & blogs are wrong:

  • A boyfriend is not a scalar.
  • Who says that a date equals a sample? I’ve been getting to know the human race my whole life. Every day I spend single, married, or it’s complicated — I am learning more information that can be used to set my aspiration level for a partner.
  • You can go back sometimes — either to rekindle a relationship that, in retrospect, was red-hot, or to revisit a crush you didn’t get far enough with to make things awkward.
  • There aren’t just 100 boys to look through. Let’s face it, there might as well be an infinite number of fish in the sea.
  • Um: a boyfriend is not a scalar. Love depends on you as well; if you could reduce your feelings to a scalar, you’d still want to model the relationship as a 2-equation dynamical system. Interplay; choices; reactions.


The original paper is called “Satisficing in Mate Search”. (I couldn’t find it online). Here is much, much more material on both data on dating and the science of thinking smarter by Dr. Todd.

You can also read Simple Heuristics That Make Us Smart (it’s on my to-read list — and it contains “Satisficing in Mate Search”), and if you look at Amazon’s similar books for the title you’ll come across all kinds of fascinating stuff: about Bayes’ rule, thinking from the gut, less is more, why it’s good to be stupid, willpower, and even an intro to game theory. (I haven’t read that particular treatment, but I do recommend reading just-a-little-bit of game theory as an awesome way to expand your imagination.)

You can get instant gratification with a free chapter of each, so these popular treatments are just as candy-like as Wired or Cosmo.


Just like with modern physics, this modern psychological science is super interesting. Way too interesting to justify wasting time on false and farcical narratives that totally miss the point.

To gild refined gold, to paint the lily, to throw perfume on a violet, … is wasteful and ridiculous excess. —King John