Date of Award
6-2-2025
Degree Type
Thesis
Degree Name
Master of Science (M.S.)
Department
Computer Science
First Advisor
Manar Samad
Abstract
The handling of missing values is a pervasive challenge in tabular data sets, particularly in electronic health records (EHR), where incomplete data can hinder predictive modeling. Data with missing values are unfit for machine learning, whereas the imputation of missing values affects data quality and data-driven outcomes. Traditional statistical and machine learning-based imputation techniques often struggle with high missing rates and complex missing patterns. This thesis investigates the deep learning of attention between features and between samples in missing value imputation. These two attention mechanisms jointly capture the row-column structure of tabular data. It presents a novel deep learning framework that integrates between-feature and between-sample attention within a contrastive framework to reconstruct missing values. In addition, it incorporates cutMix-based data augmentation, enhancing the model’s ability to generalize across diverse missingness patterns. The proposed approach is evaluated in 13 data sets, including real-world EHR data, and benchmarked against ten state-of-the-art imputation methods, including imputation based on autoencoder, diffusion, and deep generative models. The experimental results demonstrate the superiority of the proposed joint imputation method against the state-of-the-art baseline models in terms of normalized root mean squared error (NRMSE) under varying missing value types and missing value rates (10% – 90%). This research highlights the structural variability of tabular data and provides actionable recommendations for selecting effective imputation strategies based on data set characteristics.
Recommended Citation
Kowsar, Ibna, "DEEP IMPUTATION OF MISSING VALUES USING FEATURE AND SAMPLE ATTENTION" (2025). Tennessee State University Alumni Theses and Dissertations. 259.
https://digitalscholarship.tnstate.edu/alumni-etd/259
