Quantcast

If you’ve read Stats 101 at your local institution of schooling and refinement, you know the difference between false positives and false negatives.

  • False positive. Oncologist, to patient: Oh my God. This is terrible. Just terrible. Patient: What? Is it bad, Doc? Oncologist: Oh, not you. My son’s handwriting. It’s terrible! Practically illegible.”
  • False negative: Pregnancy test:  Eight months later: Waaaah!

False positive is when the canary has spent several years building up an immunity to iocaine mine gas; you stroll in and die. False negative is when the canary dies of canary-pneumonia in a gas-free mine; you scurry away and miss out on $500bn worth of shale coal.

twitter bird

  
Signals

For algorithmic traders, a “signal” is the switch that tells your software “Buy! Buy!” or “Sell! Sell!”

  • Computer: Just give me the signal, boss, and I’ll buy 10,000 shares of the company that makes IcyHot.
  • Trader: Let’s see … gold is up 50% on the year … the underlying is retracing between the third and fourth Fibonacci levels  … the volatility of the DOW is below 30 … it’s raining in Moscow … and my Alabama state government newsfeed just flashed the word “indubitably” … throw down the iron condor! Hard!!!

If I think about looking for a signal, I think about: when should I do this trade?

Anti-Signals

The idea behind this exercise is to have a computer search through data streams for you and tell you what’s a good time to trade.

If you take the perspective that the only thing you can control is your bet size (and not what the market will do), then it becomes clear that the choice is not only about {yes, no} but also about [£0, £100bn].

Accordingly, something that tells you when not to trade can be just as valuable as something that tells you when to trade.

The most obvious non-trading scenario is the Federal Open Market Committee. Say you normally trade forex intraday, close out all your positions when you leave the screen, and that’s your game. Right after the FOMC announcement, market movements may be drastic and will have little to do with what you normally bet on — unless you trade FOMC announcements specifically. But the point is it’s a separate modelling problem.

In theory at least, any strategy should be improvable if you can accurately identify conditions when the strategy fails. Removing losers will add to your PnL just as surely as adding winners.

I’ll make up a fake example with fake data (aka, lies). Say your strategy is to trade in the direction of momentum of S&P 500 E-mini’s iff the directionality has been sustained for at least 70% of the last five minutes, and to pull out the trade iff the fraction of price movements in your direction falls below 70%.

Looking at each hour of the past year, the least profitable hour for this trade, statistically, has been 12:00-1:00 New York time. So if you had followed the exact same instructions but closed out all positions and never traded during lunchtime, your PnL during 2011 would have been higher than the strategy as originally stated. (Of course, this example works only because one hour had to be the least profitable. But the same difficulty—distinguishing real patterns from numerical mirages—inheres in signal identification as well as anti-signal identification. If you identify a real cause & effect then the anti-signal should work.)

 
Any Statistical Model

Say you are trying to calculate when, where, and wieviel an advertiser should bid in a DMP for internet ad space. You take as inputs known or presumed data about site visitors, indexed by cookie, and produce as (eventual) output a list of which ads to buy, when, and how much to bid for them.

Here the same anti-signal concept could apply.

Instead of thinking, What are some damned good characteristics in this space? or Should I try another algorithm? This other paper says RandomForests aren’t as good as Breiman says. , think What data is the AI really failing on? You can remove those data from the training set and decline to make recommendations about cookies within that hull.

Say you are scanning a number of text resumes on a site like Indeed <aff link> and trying to figure out whose application you should invite for a geomodelling job. Just as much as searching for positive keywords like “Petrel”, you might want to filter out negative keywords like “definately”.

Say you are training your machine to learn when tweets will be effective and when not. Instead of shoving every tweet through the lingpipe, first filter out the non-English-language tweets.

OK, that last example is really obvious. I am not claiming that anti-signals are novel. It’s just a word I made up for something that’s common sense. But coining the word reminds me when I look at a modelling problem, to turn the problem upside-down and ask if there’s any low-hanging fruit on the other side.

23 notes

  1. logicianmagician reblogged this from isomorphismes
  2. supervenes reblogged this from isomorphismes
  3. dubiousradical reblogged this from isomorphismes
  4. davidaedwards reblogged this from isomorphismes
  5. bankruptspermbank said: naming simple concepts is the easiest way to get into the textbooks, yo.
  6. isomorphismes posted this