Posts tagged with intersection

## Dummyisation

Statisticians are crystal clear on human variation. They know that not everyone is the same. When they speak about groups in general terms, they know that they are reducing N-dimensional reality to a 1-dimensional single parameter.

Nevertheless, statisticians permit, in their regression models, variables that only take on one value, such as `{0,1}` for `male/female` or `{a,b,c,d}` for `married/never-married/divorced/widowed`.

No one doing this believes that all such people are the same. And anyone who’s done the least bit of data cleaning knows that there will be `NA`'s, wrongly coded cases, mistaken observations, ill-defined measures, and aberrances of other kinds. It can still be convenient to use binary or n-ary dummies to speak simply. Maybe the marriages of some people coded as `currently married` are on the rocks, and therefore they are more like `divorced`—or like a new category of people in the midst of watching their lives fall apart. Yes, we know. But what are you going to do—ask respondents to rate their marriage on a scale of one to ten? That would introduce false precision and model error, and might put respondents in such a strange mood that they answer other questions strangely. Better to just live with being wrong. Any statistician who uses the `cut` function in R knows that the variable didn’t become basketed←continuous in reality. But a `facet_wrap` plot is easier to interpret than a 3D wireframe or cloud-points plot.

To the precise mind, there’s a world of difference between saying

• "the mean height of men > the mean height of women", and saying
• "men are taller than women".

Of course one can interpret the second statement to be just a vaguer, simpler inflection of the first. But some people understand  statements like the second to mean “each man is taller than each woman”. Or, perniciously, they take “Blacks have lower IQ than Whites” to mean “every Black is mentally inferior to every White.”

I want to live somewhere between pedantry and ignorance. We can give each other a break on the precision as long as the precise idea behind the words is mutually understood.

` `

Dummyisation is different to stereotyping because:

• stereotypes deny variability in the group being discussed
• dummyisation acknowledges that it’s incorrect, before even starting
• stereotyping relies on familiar categories or groupings like skin colour
• dummyisation can be applied to any partitioning of a set, like based on height or even grouped at random

It’s the world of difference between taking on a hypotheticals for the purpose of reaching a valid conclusion, and bludgeoning someone who doesn’t accept your version of the facts.

So this is a word I want to coin (unless a better one already exists—does it?):

• dummyisation is assigning one value to a group or region
• for convenience of the present discussion,
• recognising fully that other groupings are possible
• and that, in reality, not everyone from the group is alike.
• Instead, we apply some ∞→1 function or operator on the truly variable, unknown, and variform distribution or manifold of reality, and talk about the results of that function.
• We do this knowing it’s technically wrong, as a (hopefully productive) way of mulling over the facts from different viewpoints.
• In other words, dummyisation is purposely doing something wrong for the sake of discussion.

Yesterday’s #SB5 conflict is an opportunity to talk about the algebra of sets. (Commonly understood through Venn diagrams.)

The algebra of sets is the way ∪, ∩, , and some composite operations work. These words are useful for reminding yourself to logically separate things that are logically separate.

The relative complement of A (left circle) in B (right circle):
$A^c \cap B~~~~=~~~~B \smallsetminus A$

For example:

• Not all feminists are women.

`Feminist ∩ ∁{Woman}`
• Not all women are pro-choice.

`∁{Pro-Choice} ∩ Woman`

Sets can overlap in different ways.