Research

This laboratory primarily focuses on investigating the impact of bias present within datasets on the results produced by machine learning algorithms. So, what is bias? In machine learning, algorithms use a dataset as input to identify and extract its statistical features. In the context of supervised learning, a dataset consists of pairs of features and corresponding labels. The algorithm then determines the relationship between these features (e.g., images, textual documents) and labels (e.g., object designations within images, categories of textual documents) in the form of a mapping from a feature to a label. With this mapping, often referred to as a predictor, it becomes possible to predict labels for new, unlabeled features. It’s crucial to recognize that the data fed into the machine learning algorithm can be flawed for various reasons. These distortions in the dataset are what we refer to as bias.

Fairness

One specific type of bias that our laboratory explores is related to issues of fairness. Fairness concerns arise when the data collection or generation process is influenced by discriminatory human actions, whether consciously or unconsciously. Such skewed data subsequently affects the performance of machine learning algorithms, leading to the construction of a biased predictor. Discriminatory outcomes of machine learning algorithms have been empirically observed in real-world systems. A notable example is the risk assessments produced by the COMPAS algorithm, which has been examined by ProPublica. COMPAS is a machine learning algorithm used by U.S. courts to estimate a defendant’s risk of recidivism on a ten-point scale. According to ProPublica’s analysis, African Americans are almost twice as likely as Whites to be assigned a higher risk score without actually re-offending. Conversely, Whites are almost twice as likely as African Americans to be categorized as low-risk despite having re-offended. This empirical evidence suggests that the COMPAS algorithm disproportionately favors White defendants while disadvantaging African American defendants.

For the sake of fairness, our laboratory is dedicated to developing machine learning algorithms that mitigate bias in predictors across various situations. Specific studies conducted by our laboratory include:

Related publications

  1. K Fukuchi , S Hara and T Maehara. Faking Fairness via Stealthily Biased Sampling. Vol.34, 01, pp.412-419, In The Thirty-Fourth AAAI Conference on Artificial Intelligence, Special Track on AI for Social Impact, 2020. doi: 10.1609/aaai.v34i01.5377. arXiv
  2. K Fukuchi and J Sakuma. Neutralized Empirical Risk Minimization with Generalization Neutrality Bound. Vol.8724, , pp.418–433, In Machine Learning and Knowledge Discovery in Databases, 2014. doi: 10.1007/978-3-662-44848-9_27. arXiv
  3. K Fukuchi , J Sakuma and T Kamishima. Prediction with Model-Based Neutrality. Vol.8189, , pp.499–514, In Machine Learning and Knowledge Discovery in Databases, 2013. doi: 10.1007/978-3-642-40991-2_32.
  4. K Fukuchi , T Kamishima and J Sakuma. Prediction with Model-Based Neutrality. Vol.E98.D, 8, pp.1503-1516, IEICE Transactions on Information and Systems, 2015. doi: 10.1587/transinf.2014EDP7367.
Machine Learning and Data Mining Laboratory
Degree Programs in Systems and Information Engineering
University of Tsukuba
Laboratory of Advanced Research B
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573
029-853-5530 (Dept. of Computer Science)
029-853-2111 (University of Tsukuba)