These projects aim to impute missing values of the given datasets. You have to write a code in the programming language of your choice (e.g., MTLAB /or/ Python /or/ R /or/ C /or/ C++) to read some excel data
(step-1), identify the missing data
(step-2), and then impute the missing values in the data based on the technique
given in the proposed reference for this project
(step-3), consequently, return the imputed data and compare it with the complete data to measure the accuracy and reliability of your results (step-4).
In the step 1, do not limit your code to a specific data size or data dimension, I mean you have to be able to read or load the data with different size and dimension. You will receive some datasets with numerical/categorical attributes in XLS and/or CSV format.
In the step 2, you discover the number and the location of the missing data. For instance, if you return the missing indices, you are able to discover the missing data patterns (univariate, monotone, arbitrary missing data). Then not only you can successfully handle the next step, but also you gain more points!
In the step 3, you have to read the reference paper given for the proposed method and understand the algorithm and try to write a code to impute (i.e., single or multiple) the missing data based on the given approach.
In the step 4, you have to manage your code to return the imputed values. Then you are able to compare the imputed values with the original complete data to compute the error (NRMS). You can automatically or manually generate some diagrams to present and compare your results with the original complete datasets.
REFERENCE: Yaohui Ding, Arun Ross, “A comparison of imputation methods for handling missing scores in biometric fusion,” Pattern Recognition, Volume 45, Issue 3, pp. 919-933, 2012; Predictive mean matching (PMM) [Package “mice” in R].
Dataset link: [login to view URL]