Neuron Makers · Ch. 15 · Bigotry

The machines saw everyone but her

In 2018 Joy Buolamwini and Timnit Gebru audited three commercial face-classification systems. On lighter-skinned men all three were nearly flawless, erring on fewer than one in a hundred. On darker-skinned women the error rate climbed to as high as 34.7% — better than one wrong guess in three. The systems failed worst on the people least represented in their training data.

Lighter men

Lighter women

Darker men

Darker women

Across all three vendors the worst-served group is always the same. The gap between the best-served group (lighter men) and the worst (darker women) runs to more than 30 percentage points — on the same task, inside the same product.

SOURCE · NEURON MAKERS, CH. 15. Headline figures (34.7% on darker-skinned women; lighter-skinned men under 1%; >30-point gap) are stated in the chapter. The full per-vendor breakdown by group is the underlying paper: Buolamwini & Gebru, "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," PMLR 81 (FAT* 2018), Table 4. Gender-classification error rate (%), axis 0–40%.