Google learns to smile, because AI's bad at it
Biased models mean bad decisions for women and some races. Google boffins think they've improved things a bit
Google's taken a small step towards addressing the persistent problem of bias in artificial intelligence, setting its boffins to work on equal-opportunity smile detection.
In a paper published at arXiv December 1, Mountain View trio Hee Jung Ryu, Margaret Mitchell and Hartwig Adam laid out the results of research designed to handle the twin problems of gender and race diversity when machine learning is applied to images.
Biased models have become a contentious issue in AI over the course of the year, with study after study documenting both the extent of algorithmic bias, and the real-life impacts such as women seeing ads for low-paying jobs and African-Americans being sent more ads about being arrested. In spite of this, researchers are still comfortable making phrenology-like claims about identifying criminal faces, or believing that their AI can spot beautiful women.
Google's authors agreed that bias is an issue, and wrote “users have noticed a troubling gap between how well some demographics are recognised compared with others”. Problems they noted included mis-gendering a woman simply because she's not wearing makeup, or being unable to classify a black face at all.
The paper stated that Google is not seeking to classify people by race (since that's both unethical and arbitrary), and the authors noted that using AI to classify race or gender needs the individual's consent.
Nonetheless, training race and gender recognition into the model is necessary if the AI is going to reliably identify a smile, and that's how the researchers approached the problem: “At the core of this work lies the idea that faces look different across different races and genders, and that it is equally important to do well on each demographic group”, the researchers wrote.
First, the researchers applied a more granular view of misclassifications: “we report values for accuracy per subgroup … [and] we also introduce a metric that evaluates false positive rates and false negatives rates in a way that is robust to label and subgroup imbalances”.
That helped them correct for the common sample bias in training data sets, that many of them have a preponderance of white European samples.
With that classification in hand, the researchers then applied over-sampling to groups under-represented in the dataset. For subgroups too small for that to work, they made their own decisions (an “off-line oversampling method in order to make sure each training batch contains faces across all race × gender”, as they wrote).
The results: up to 99 per cent gender accuracy (on the 200,000 image CelebA dataset). On the Faces of the World (FotW) dataset, gender and race accuracy was above 90 per cent for most subgroups. On a dataset collected by scraping 100,000 celebrity images from the Web, the researchers wrote, they trained their model to “98 per cent or greater” area under the curve.
Which brings us to smile detection: the more granular pre-processing yielded smile detection accuracy over 90 per cent across the whole dataset, by gender, by race, or subgrouped by both gender and race. ?