R Code to Calculate Random Forest Out-of-Bag Estimate of Error (Revised Price)

Will pay $50 for project to start immediately and be completed within 24 hours (by June 18th, 11:30pm GMT). $10 bonus if completed by within next 4.5 hours (by June 18th, 4:00am GMT). I am available to work with you by chat until then.

Preference given to freelancers who have R and Random Forest experience. You will need to be very familiar with Random Forests and R as I am not and can not provide much assistance.

Essentially, I am looking for an small enhancement of the Random Forest process in the R GUI called Rattle. From what I can tell by looking at the R Add-In called Party, there are a number of functions included which might mean adding perhaps 5-15 additional lines of code to what I already have (although I could certainly be off on that estimate).

Using the R GUI called Rattle, I can easily select my dataset (see below) and choose a single Y, as well as the random seed, and choose the ratio of training to testing data. Next, I execute the RF (Random Forest) model choosing only the number of trees (default is 500) and the number of predictors (default is the integer of the square root of m total predictors). From this, R (through Rattle's code) gives me the Out-of-Bag Error and the traditional 2x2 classification grid for both training and testing data. Not including the 5 seconds it takes R to run the code, I can set up this scenario from scratch in less than 1 minute. Due to Rattle’s limitations, I can only execute for a single Y at a time. This issue, as well as the inability to aggregate those Out-of-Bag results, is my problem.

The algorithm above is outlined very succinctly at [url removed, login to view]~dzeng/BIOS740/[url removed, login to view] on the first page under the title “The algorithm” and is covered in the listed points 1, 2, 3 and 1. Essentially, what I need done is the very next point they list that says:

2. Aggregated the OOB predictions. (On the average, each data point would be out-of-bag around 36% of the times, so aggregate these predictions.) Calculate the error rate, and call it the OOB estimate of error rate.

However, as I am really after the PPV (Positive Prediction Value - i.e. where a 1 is predicted for Yn) and not the global OOB error (due to my data being skewed towards y-values of 0) of the models, I am more interested in the raw prediction counts so I can calculate error rates myself.

I will supply a CSV data sample of ~4000 observations (~50/50 training/testing split) with multiple binary Y's and multiple binary X's and one continuous X (an integer ranging from 0 to ~30) for each observation. I can even supply the R code from Rattle for the procedure I am currently using.

I would like your R code to be able to accept the following inputs from me:

-observations in the format: Observation #, Y1…Yn, X1…Xm

-random seed value

-number of trees value (default is 500)

-number of predictors to be randomly sampled (default is the integer of the square root of m total predictors)

-number of rows at bottom of data list for holdout data (to be scored each round)

-number of rounds (which will be ~1,000 – 1,000,000)

I would like your R code to be able to supply the following outputs to me:

-CSV file with full original data plus the aggregated OOB prediction totals (for both training and testing data) for each observation for each Y (i.e. the number of times the OOB prediction was 0 for each observation for each Y and the number of times the OOB prediction was 1 for each observation for each Y)

If you happen to be aware of an open source R GUI that will already do all of the above for me (and that I can understand and use), you can just help me install it and will not need to supply the R code. As long as it works for me, the project will be considered completed.

Kemahiran: Algoritma, Pembelajaran Mesin, Matematik, Bahasa Pengaturcaraan R, Statistik

Lihat lagi: y trees, what's an algorithm, what's algorithm, what is the algorithm, what is algorithm mean, what do you mean by algorithm, what are binary trees, what algorithm, use of binary, use of algorithm, trees in algorithm, set algorithm, scenario chat, rf freelancers, price list for freelancers 2016, number of freelancers, number and lines freelancers, it works global, forest freelancers, data trees

Tentang Majikan:
( 11 ulasan ) Charlottetown, Canada

ID Projek: #10800664

6 pekerja bebas membida secara purata $236 untuk pekerjaan ini


I am a STATISTICS tutor for last 5 years. I have expertise in Statistical Analysis. I can show you some of my previous analysis. I have excellent concepts of Random variables, Probability Distribution, Sampling and di Lagi

$300 USD dalam 5 hari
(77 Ulasan)

dear sir, i have more than 8+ years of experience in r Programming.i can provide you best suitable solution for your requirement. looking forward for more discussion.

$222 USD dalam sehari
(8 Ulasan)

I am Herilalaina RASOLONJATOVO from Madagascar, and I am an expert in Data Analysis using R programming and I can help you do this project according to your specification. I have done several projects in this field in Lagi

$200 USD dalam sehari
(2 Ulasan)

i am an expert with data structure, algorithm and so on. if you select me, you will be lucky. please keep in touch.

$150 USD dalam sehari
(3 Ulasan)

Hi, I have extensive experience in R and I am the author of muRandomForest (Product for leading Analytics Player Mu Sigma). I can complete in next 5 hours. Thanks, Atul

$250 USD dalam 3 hari
(0 Ulasan)

I have 3 year experience in the same. My most of the previous work experience is in field of healthcare or more precisely analyzing patients data for developing algorithms for automates disorder detection. I have dev Lagi

$222 USD dalam 3 hari
(0 Ulasan)

A proposal has not yet been provided

$222 USD dalam sehari
(0 Ulasan)