Consists of checking out a small data set, loading the data properly.
Then add a few new variables.
Then perform different logistic regressions and knn-model,.
These models should be validated "using 10-fold cross-validation, repeated 5 times, with the area-under-the-ROC-curve metric (the ROC statistic in caret). Where appropriate, select the best settings for tuning parameters using the one-standard error rule. Use the same folds for all models (for example, by resetting the random seed appropriately"
After that the best model should be chosen.
The whole project would end in three analyses:
1 A summary of from a business perspective, intended to be shared with "the client"
2 A more technical summary of the work done until now, for your data scientist colleagues; and
3. A list of priorities to investigate in subsequent iterations of the project, also for your colleagues.
Everything should be in an R-notebook and also in an html output! More details when I get offers. The project is relatively urgent, so if you are not busy.. Send an offer.