This needs to be done in R, so we need R code as a result. Attached is the file "[login to view URL]"
The attached file consists of 12 columns and 1000 rows (incl. header) , The headline identifies each row with ID, cohort, PC1 to PC10. Cohort contains one of the 6 uniq values ("Control", "set_a", "set_b", "set_c", "set_d", "set_e")
Take each "set" one-by-one and find the closest match to the PCs by finding the control with the minimum of sum over i of square (PCi-PCi) where i stands for PC1, 2 ... 10 and the difference is between the value for the "Control" and "set". So one needs to calculate these for case/set pairs. Once a control is selected it needs to be removed completely so it won't be selected for another case.
Start with a case chosen from one series, and determine the best control. Then switch to another case series and find the best control for a chosen case. Continue until the end of all the cases. Then, start again finding a new control for each case until you reach controls for each case.
Our goal is to select 5 controls for each "set" that are closely matched.
8 pekerja bebas membida secara purata $139 untuk pekerjaan ini
I am a data scientist by profession with more than 4 years of programming experience in R and have completed more than 35 projects in R. I can finish the task within 24 hrs