Welcome to MDL
The Machine Learning and Data Mining Laboratory (MDL) is affiliated with the Master’s/Doctoral Program in Computer Science in the Degree Programs in Systems and Information Engineering at the University of Tsukuba. We develop algorithms and conduct theoretical analyses, the latter grounded in mathematical statistics, to advance the trustworthiness of machine learning and data mining. We focus in particular on challenges caused by data bias. Learn about our research directions
We welcome applications from prospective Master’s and PhD students interested in developing algorithms for trustworthy machine learning under data bias, or in the theoretical analysis of such algorithms grounded in mathematical statistics. See the Opportunities page for more information.
News
Research Overview
Machine learning is a technology that enables computers to make predictions by learning from large amounts of data, and now powers applications ranging from chatbots to medical image analysis. However, training data can deviate from what the model trainer intends and become inadvertently skewed. This skew of data is referred to as data bias. Models trained on biased data are strongly influenced by this bias, potentially producing unreliable outcomes. Our research develops algorithms that behave as intended even in the presence of data bias. Three concrete research topics are as follows:
- Fairness: Prediction target data can be skewed by socially sensitive attributes such as race, gender, and age, reflecting historical and cultural biases in how data was collected. This skew may be inherited by the trained model, causing it to make discriminatory predictions against those groups. Fair machine learning algorithms are designed to produce accurate, non-discriminatory predictions even when trained on such biased data. Our recent work includes a post-hoc algorithm that efficiently controls the fairness-accuracy trade-off (M. Sakata et al., 2026) and the theoretical characterization of algorithms achieving the best accuracy under fairness constraints (K. Fukuchi, 2025; K. Fukuchi et al., 2023).
- Transfer Learning: Data observed at training time is often distributed differently from data encountered at prediction time. For example, patient data from one hospital may not match another, and a model trained on simulation data may fail in the real world. This discrepancy in data distribution can cause the trained model to perform poorly on the target data. Transfer learning addresses this bias by adapting the model to the target distribution using a small amount of target data. Our recent work includes provably successful transfer learning in settings where source and target domains do not overlap (M. Fujikawa et al., 2025) and a provable scaling law for pre-training (K. Fukuchi et al., 2026).
- Out-of-Distribution Generalization: Prediction target data can be spuriously correlated with non-causal attributes; for example, in object detection, a cow nearly always appears in a grassy field in training data, causing the model to associate the background with the label rather than the animal itself. This spurious correlation may cause the trained model to fail when the same object appears in an unfamiliar context. The aim of out-of-distribution generalization is to develop methods that remain accurate even when such spurious correlations do not hold at test time. Our recent work includes developing algorithms that mitigate spurious correlations even when those attributes are unknown, using vision-language models (下坂, 2024).
Current Funded Projects
Clarification of Minimax Optimality of Out-of-Distribution Generalizable Fair Regression Algorithms via Foundation Models (KAKENHI Grant-in-Aid for Scientific Research (B); 2026–2030; PI: Kazuto Fukuchi)
- This project aims to clarify the fair minimax optimal (best fair learning algorithm) under the out-of-distribution generalization scenario. To achieve the generalization for the out-of-distribution, we will utilize foundation models to find the common task across different domains.
Understanding Attack Mechanisms against AI through Causal Structures of Classification and Building Countermeasures (KAKENHI Grant-in-Aid for Scientific Research (A); 2023–2027; Co-PI: Kazuto Fukuchi (PI: Jun Sakuma at Institute of Science Tokyo))
- This project aims to understand the attack mechanisms against AI through the causal structures of classification and to build countermeasures.
References
- Kazuto Fukuchi and Jun Sakuma. Demographic Parity Constrained Minimax Optimal Regression under Linear Model. Advances in Neural Information Processing Systems, vol. 36, pp. 8653-8689, 2023.arXiv
- Kazuto Fukuchi. Meta Optimality for Demographic Parity Constrained Regression via Post-Processing. Forty-second International Conference on Machine Learning, vol. 267, pp. 18024-18046, 2025.arXiv
- Kazuto Fukuchi, Ryuichiro Hataya, and Kota Matsui. Provable Target Sample Complexity Improvements as Pre‑Trained Models Scale. Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, 2026. to appear.arXiv
- and Kazuto Fukuchi. Fair Classification with Efficient and Post-hoc Controllable Fairness-Accuracy Trade-off. Forty-Third International Conference on Machine Learning, , pages, 2026.
- , Youhei Akimoto, Jun Sakuma, and Kazuto Fukuchi. Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, vol. 258, pp. 392-410, 2025.arXiv
- , 福地 一斗. 視覚言語モデルを用いたスプリアス相関の低減における欠損グループへの汎化. 第27回情報論的学習理論ワークショップ (at 情報論的学習理論ワークショップ), vol. IBIS2024, pp. -, 2024 (ポスターのみ).