Computer scientists use filters, ≥ signs, intersections (sql), and other forms of what I would call “hard boundaries”.
Grepeither finds what you’re looking for, or it doesn’t.- The condition inside your
while(){loop either tripstrueand the interior code runs, or it tripsfalseand it’s skipped. - You either follow someone on twitter, or you don’t.
- You either crawled a webpage, or you didn’t.
- In exploring a code tree or other graph, you either look at the node, or you don’t.
- Two people either are Facebook friends, or they aren’t.
- The tweet either included a word from this list, or it didn’t.
But, one needn’t be so conceptually constrained. Thinking in a fuzzy logic sense, it’s possible to create a “soft” boundary.
To use a classic example from Bart Kosko’s book, although the American legal system imposes a “hard boundary” on adulthood (OK, a series of hard boundaries—16, 18, 21, 25), one really passes into adulthood gradually over time. (Unless you have your first kid at 16, in which case you grow up real quick. But talking about the upper-middle-class college-enrolled set here: most of them grow up slowly.)
That’s nice in a philosophical, contemplative way. But can we use the soft-boundary concept for anything useful? I think so.
For example, in this neo4j video (minute 5) Marko Rodriguez gives us the following line of Gremlin code:
g.v(1).outE.filter{it.label=='knows' & since > 2006}.count()
We could either be naïve about this and treat 2006 as a hard boundary, or make it a variable and perform sensitivity analysis. In fact, any time we see a number we could turn it into a parameter — ending with a hull of list. We could poke about in that parameter space and by doing so get a better idea of the shape of things than setting a naïve tripwire.
Is there a design pattern for this?
Notice also his gremlins can “be” on multiple nodes at once. That’s certainly not a binary data structure to the codomain. Other non-binary aspects to his graphs:
- different words (“coloured edges” in graph parlance) like “speaks”, “has worked with”, “had a child with” — all of the richness and drama of Quine’s ontology of language wrought in the connectome of the graph
- the network structure itself
- and of course edge weights
Here’s an example from Unix for Poets:
cat bible | grep Abel | uniq -c
So-called “bright lines” appear also in the law (married vs not), statistical regression (dummy/indicator variables), and tax brackets (under $15,000.00 or ≥ $15,000.01).
They’re frustrating because they’re discontinuous. (Actually tax brackets are not but the first derivative is discontinuous.)
Imagine the following (non-existent, stupid) tax system:
- If you make under £30,000/year you pay no tax.
- If you make ≥ £30,000.01/year you pay 50% tax on every dollar you made (all the way down to £0.01).
It’s frustrating because it’s discontinuous. I might not go as far as to say that continuity, smoothness, holomorphicity, analyticity and so on are “natural to the human mind” — if in fact we can just take a monolithic view on “the” mind — but continuity and smoothness certainly seem—to me and to other mathematical writers I’m thinking of—like they’re more fair, just, or sensible.
Imagine you’re trying to catch an email spammer, and you’ve determined that the character ! is a good trigger for spams. You could either
- set a hard boundary: more than 3 !’s, flagged for spam; ≤ 3 !’s, not flagged
- or you could count the number of !’s in the text
The latter approach is more flexible:
- you can change the parameter 3 to something else
- you can pass the count through a function (like a sigmoid, monotone convex or monotone concave function, or the cumulative-prospect-theory function)
- As in minute 14 of this d3.js video you can add (something like a) “blending” parameter
- you can set a known algorithm (like logistic regression) to find the optimal parameter value for you
- you can combine the ! count with other variables (like counts of the word herbal or counts of the forenames of people in the mail user’s address book)
- you can combine the ! count with other variables and use a known algorithm (like a backprop net) to set all the optimal values for you
- maybe you can find a way to half-instantiate your desired response when the count is “at half mast” or “in a middling range”.
Back to catching spammers, I drew up an idea for tumblr to catch its spammers a while ago. I noticed a few telltale markers of spam accounts:
- quick liking in succession
- squatting on a hashtag
- high number of likes
- no / low content in the title
- at first the spammrs were not reblogging stuff (now they not only reblog but post fakey “original” looking text posts … that’s counter-evolution for ya) so they usually had no posts on their blog page
- exist ads on the sidebar
They opted for social proof (let people “block” spammy likers from their dashboard and flag them as suspected spammers), which seems to have worked out very well. So I’m not saying “soft boundaries are always better” or something — just that if a “hard boundary” is preventing you from thinking about a problem like you want to, you can get around it pretty easily!
I think computer scientists do use soft boundaries, although they might not draw the same analogy to the “crisp” > sign as I am.
- tag clouds don’t just count words — they increase the display size of the word depending how large the count is (maybe the
sqrtof the count?). That tag clouds count different words rather could also be construed as a “coloured” codomain. - you don’t just return a webpage or not return a webpage in your crawler. You might get a 404, or you might get a 302. Or you might get a 200, 500, 303, 504, and so on. Additionally the page might be in HTML, JSON, or might simply flip a switch (“turn on my remote TV recording device”).
Business people (I’ve found) think naturally in terms of soft boundaries as well. If your client / boss is using the word “score” you can mimic that directly with what I’m calling a “soft boundary”.
All you’ve got to do is make up a functional that “measures stuff” any way you want, and slide your > sign along the resulting smooth scale.







![Fuzzy Logic
Not everything is so simple as true or false. Even declarative statements may evaluate outside {0,1}. So let’s introduce the kind-of: truth ∈ [0,1].
Examples of non-binary declarative statements:
Shooting trap, my bullet nicked the clay pigeon but didn’t smash it. I 30%-hit the mark.
I’m not exactly a vegetarian. I purposely eat ⅔ of my meals without meat, but — like yogini Sadie Nardini — I feel weak if I go 100% vegetarian. So I’m ⅔ contributing to the social cause of non-animal-eating, and I’m a ⅔ vegetarian.
I’m sixteen years old. Am I a child, or an adult? Well, I don’t have a career or a mortgage, but I do have a serious boyfriend. This one is going to be hard to assign a single number as a percentage.
So that’s the motivation for Fuzzy Logic. It sounds compelling. But the academic field of fuzzy logic seems to have achieved not-very-much, although there are practical applications. Hopefully it’s just not-very-much-yet (Steven Vickers and Ulrich Höhle have two interesting-looking papers I want to read).
I see three problems which a Sensible Fuzzy Logic must overcome:
Implication. Classical logic (“the propositional calculus”) uses a screwed up version of “If A, then B”. It equates “if” to “Either not A, or else B is true, or else both.”Fuzzy logic inherits this problem — but also lacks one clear, convincing “t-norm”, which is the fuzzy logic word for fuzzy implication. Can you come up with a sensible rule for how this should work?:
A implies B, and A is 70% true. How true is B?
Furthermore, should there be different numbers attached to “implies” ? Should we have “strongly implies” and “weakly implies” or “strongly implies if Antecedent is above 70% and does not imply at all otherwise” ?
You can see where I’m going here. There is an ℵ2 of choices for the number of possible curves / distributions which could be used to define “A implies B”.
Too specific. Fuzzy logic uses real numbers, which include transcendental numbers, which are crazy. Bart Kosko’s book explains FL with familiar two-digit percentages, which are for the most part intuitive. So I can accept that something might be 79% true — but what does it mean for something to be π/4 % true? Or e^e^π^e / 22222222222 % true?We’re encumbering the theory with all of these unneeded, unintuitive numbers.
One-dimensional. For all of the space, breadth, depth, and spaceship adventures contained in the interval [0,1], it’s still quite limited in terms of the directions it can go. That is [0,1] comprises a total order with an implied norm. Again, why assume distance exists and why assume unidimensionality, if you don’t actually mean to. There are alternatives.Unidimensionality excludes survey answers like
N/A
I don’t know
Sort of
Yes and no
It’s hard to say
I’m in a delicate superposition
, — or rather it maps effectively different answers onto the same number. Sometimes things are both good and bad;
sometimes they are neither good nor bad;
sometimes things are not up for evaluation;
sometimes a generalised function (distribution) expresses the membership better than a single number;
sometimes the ideas are topologically related or order related but not necessarily distance related;
sometimes an incomplete lattice might be best.
So those are my gripes with fuzzy logic. At the same time, Kosko’s book was my introduction to an interesting, new way of thinking. It definitely set my mind spinning. For the logical mind that wants a rigorous framework for understanding ambiguity, vagueness, and gray areas, fuzzy logic is a good start.](http://25.media.tumblr.com/tumblr_lkhkpjwevx1qc38e9o1_500.png)





