MIDA Multiple Imputation using Denoising Autoencoders

Abstract

Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.

Publication
In The 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining
Lovedeep Gondara
Lovedeep Gondara
Research Scientist

My research interests include machine learning (deep learning) and statistics, with current research focus on large language models, differential privacy, and their applications to healthcare.