Welcome to MDL

The Machine Learning and Data Mining Laboratory (MDL) is affiliated with the Master’s/Doctoral Program in Computer Science in the Degree Programs in Systems and Information Engineering at the University of Tsukuba. We develop algorithms and conduct theoretical analyses, the latter grounded in mathematical statistics, to advance the trustworthiness of machine learning and data mining. We focus in particular on challenges caused by data bias. Learn about our research directions

We welcome applications from prospective Master’s and PhD students interested in developing algorithms for trustworthy machine learning under data bias, or in the theoretical analysis of such algorithms grounded in mathematical statistics. See the Opportunities page for more information.

News

Apr 9, 2026 Our ICML 2025 paper received the 2025 Institute of Systems and Information Engineering Best Paper Award

Feb 3, 2026 Our papers have been accepted at AISTATS 2026 and ICLR 2026

Dec 22, 2025 Members of our laboratory delivered presentations at 58th Information-Based Induction Sciences and Machine Learning (IBISML)

Nov 12, 2025 Members of our laboratory delivered presentations at The 28th Information-Based Induction Sciences Workshop, 2025 (IBIS2025)

Jul 13, 2025 Our paper has been accepted at ICML 2025

View all news

Research Overview

Machine learning is a technology that enables computers to make predictions by learning from large amounts of data, and now powers applications ranging from chatbots to medical image analysis. However, training data can deviate from what the model trainer intends and become inadvertently skewed. This skew of data is referred to as data bias. Models trained on biased data are strongly influenced by this bias, potentially producing unreliable outcomes. Our research develops algorithms that behave as intended even in the presence of data bias. Three concrete research topics are as follows:

Fairness: Prediction target data can be skewed by socially sensitive attributes such as race, gender, and age, reflecting historical and cultural biases in how data was collected. This skew may be inherited by the trained model, causing it to make discriminatory predictions against those groups. Fair machine learning algorithms are designed to produce accurate, non-discriminatory predictions even when trained on such biased data. Our recent work includes a post-hoc algorithm that efficiently controls the fairness-accuracy trade-off (M. Sakata et al., 2026) and the theoretical characterization of algorithms achieving the best accuracy under fairness constraints (K. Fukuchi, 2025; K. Fukuchi et al., 2023).
Transfer Learning: Data observed at training time is often distributed differently from data encountered at prediction time. For example, patient data from one hospital may not match another, and a model trained on simulation data may fail in the real world. This discrepancy in data distribution can cause the trained model to perform poorly on the target data. Transfer learning addresses this bias by adapting the model to the target distribution using a small amount of target data. Our recent work includes provably successful transfer learning in settings where source and target domains do not overlap (M. Fujikawa et al., 2025) and a provable scaling law for pre-training (K. Fukuchi et al., 2026).
Out-of-Distribution Generalization: Prediction target data can be spuriously correlated with non-causal attributes; for example, in object detection, a cow nearly always appears in a grassy field in training data, causing the model to associate the background with the label rather than the animal itself. This spurious correlation may cause the trained model to fail when the same object appears in an unfamiliar context. The aim of out-of-distribution generalization is to develop methods that remain accurate even when such spurious correlations do not hold at test time. Our recent work includes developing algorithms that mitigate spurious correlations even when those attributes are unknown, using vision-language models (下坂, 2024).

Current Funded Projects

Clarification of Minimax Optimality of Out-of-Distribution Generalizable Fair Regression Algorithms via Foundation Models (KAKENHI Grant-in-Aid for Scientific Research (B); 2026–2030; PI: Kazuto Fukuchi)

This project aims to clarify the fair minimax optimal (best fair learning algorithm) under the out-of-distribution generalization scenario. To achieve the generalization for the out-of-distribution, we will utilize foundation models to find the common task across different domains.

Understanding Attack Mechanisms against AI through Causal Structures of Classification and Building Countermeasures (KAKENHI Grant-in-Aid for Scientific Research (A); 2023–2027; Co-PI: Kazuto Fukuchi (PI: Jun Sakuma at Institute of Science Tokyo))

This project aims to understand the attack mechanisms against AI through the causal structures of classification and to build countermeasures.

References

Kazuto Fukuchi and Jun Sakuma. Demographic Parity Constrained Minimax Optimal Regression under Linear Model. Advances in Neural Information Processing Systems, vol. 36, pp. 8653-8689, 2023.arXiv
Kazuto Fukuchi. Meta Optimality for Demographic Parity Constrained Regression via Post-Processing. Forty-second International Conference on Machine Learning, vol. 267, pp. 18024-18046, 2025.arXiv
Kazuto Fukuchi, Ryuichiro Hataya, and Kota Matsui. Provable Target Sample Complexity Improvements as Pre‑Trained Models Scale. Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, 2026. to appear.arXiv
Maaya Sakata and Kazuto Fukuchi. Fair Classification with Efficient and Post-hoc Controllable Fairness-Accuracy Trade-off. Forty-Third International Conference on Machine Learning, , pages, 2026.
Mitsuhiro Fujikawa, Youhei Akimoto, Jun Sakuma, and Kazuto Fukuchi. Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, vol. 258, pp. 392-410, 2025.arXiv
下坂知広, 福地一斗. 視覚言語モデルを用いたスプリアス相関の低減における欠損グループへの汎化. 第27回情報論的学習理論ワークショップ (at 情報論的学習理論ワークショップ), vol. IBIS2024, pp. -, 2024 (ポスターのみ).