Research
This laboratory primarily focuses on investigating the impact of bias present within datasets on the results produced by machine learning algorithms. So, what is bias? In machine learning, algorithms use a dataset as input to identify and extract its statistical features. In the context of supervised learning, a dataset consists of pairs of features and corresponding labels. The algorithm then determines the relationship between these features (e.g., images, textual documents) and labels (e.g., object designations within images, categories of textual documents) in the form of a mapping from a feature to a label. With this mapping, often referred to as a predictor, it becomes possible to predict labels for new, unlabeled features. It’s crucial to recognize that the data fed into the machine learning algorithm can be flawed for various reasons. These distortions in the dataset are what we refer to as bias.
Fairness
One specific type of bias that our laboratory explores is related to issues of fairness. Fairness concerns arise when the data collection or generation process is influenced by discriminatory human actions, whether consciously or unconsciously. Such skewed data subsequently affects the performance of machine learning algorithms, leading to the construction of a biased predictor. Discriminatory outcomes of machine learning algorithms have been empirically observed in real-world systems. A notable example is the risk assessments produced by the COMPAS algorithm, which has been examined by ProPublica. COMPAS is a machine learning algorithm used by U.S. courts to estimate a defendant’s risk of recidivism on a ten-point scale. According to ProPublica’s analysis, African Americans are almost twice as likely as Whites to be assigned a higher risk score without actually re-offending. Conversely, Whites are almost twice as likely as African Americans to be categorized as low-risk despite having re-offended. This empirical evidence suggests that the COMPAS algorithm disproportionately favors White defendants while disadvantaging African American defendants.
For the sake of fairness, our laboratory is dedicated to developing machine learning algorithms that mitigate bias in predictors across various situations. Specific studies conducted by our laboratory include:
- Development of a fair learning algorithm that operates with a limited number of observable labels for sensitive attributes (e.g., race and gender). (missing reference)
- Exploration into the detectability of deceptive practices related to fairness. [Fukuchi, Hara, and Maehara(2020)]
- Development of a fair learning algorithm that offers a theoretical guarantee of fairness at test phase. [Fukuchi and Sakuma(2014)]
- Construction of a fair learning algorithm that functions without labels for sensitive attributes but only uses a predictive model of those attributes. [Fukuchi, Sakuma, and Kamishima(2013), Fukuchi, Kamishima, and Sakuma(2015)]
Related publications
- Faking Fairness via Stealthily Biased Sampling. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, Special Track on AI for Social Impact , Vol.34 , No.01 , pp.412-419 , 2020 . doi: 10.1609/aaai.v34i01.5377. arXiv
- Prediction with Model-Based Neutrality. IEICE Transactions on Information and Systems , Vol.E98.D , No.8 , pp.1503-1516 , 2015 . doi: 10.1587/transinf.2014EDP7367.
- Neutralized Empirical Risk Minimization with Generalization Neutrality Bound. In Machine Learning and Knowledge Discovery in Databases , Vol.8724 , pp.418–433 , 2014 . doi: 10.1007/978-3-662-44848-9_27. arXiv
- Prediction with Model-Based Neutrality. In Machine Learning and Knowledge Discovery in Databases , Vol.8189 , pp.499–514 , 2013 . doi: 10.1007/978-3-642-40991-2_32.