
Sony AI has introduced a new benchmark designed to evaluate how fairly artificial intelligence systems treat people, setting what it calls a new standard for ethical AI development. The dataset, known as the Fair Human-Centric Image Benchmark (FHIBE) — pronounced “Phoebe” — is described as the first publicly available, globally diverse, consent-based human image dataset for testing bias across a broad range of computer vision tasks.
Announced alongside a paper published in Nature on Wednesday, FHIBE is built from images of nearly 2,000 paid participants representing more than 80 countries. Every individual provided consent for the use of their likeness, and participants retain the right to withdraw their images at any time — a notable departure from the AI industry’s common practice of scraping large amounts of web data without permission. Each image includes annotations describing demographic and physical characteristics, environmental context, and even camera settings, enabling fine-grained bias analysis.
Sony said the benchmark aims to address ongoing concerns about algorithmic bias and fairness in AI. During testing, FHIBE “affirmed previously documented biases” found in major AI models while offering deeper insight into their causes. For instance, the benchmark revealed reduced accuracy for individuals using “she/her/hers” pronouns, with hairstyle variability emerging as a contributing factor — a nuance previously overlooked in most evaluations.
The dataset also identified stereotype reinforcement when AI models were prompted with neutral questions about occupations. In several cases, models described individuals from certain pronoun and ancestry groups as sex workers, drug dealers, or thieves. When asked to infer what crimes a person might have committed, the systems were more likely to produce toxic or harmful outputs for individuals of African or Asian ancestry, those with darker skin tones, and those identifying as “he/him/his.”
Sony AI said FHIBE demonstrates that ethical, diverse, and fair data collection is achievable, providing researchers and developers with a transparent foundation for improving AI behavior. The benchmark is freely available to the public and will be updated over time as new data and findings emerge.
Featured image credits: xsix via Flickr
For more stories like it, click the +Follow button at the top of this page to follow us.
