I am in possession of a Support Vector Machine algorithm that is already coded and tested in C++. I am certain that it works correctly. I am also in possession of an NVIDIA GT 430 VGA.
I have ported the existing algorithm to CUDA and it still works in a serial fashion (without kernels implemented). I have confirmed this by comparing the output files from each version.
I simply need some small sections of the serial code rewritten as kernels (having threads operating in parallel that achieve the same result faster) as proof of concept that GPU computing is faster than traditional computing. I do not need all possible situations where parallelization is possible to be parallelized and optimized, however I need the most time consuming 5-7 parts of code changed and optimized to the point where they strengthen my previous argument (i.e the overall execution time must < than the current version), whilst maintaining the accuracy of the output file. I suggest the [url removed, login to view] as an excellent place to start. The grids, blocks and threads must be setup such that the program still operates correctly on the aforementioned VGA. I believe it has Compute Capability 2.X.
For an experienced CUDA coder, this should be very basic and relatively simple to implement in 3-4 days. I have attached the VS2010 project files containing all the necessary code files, settings, input files and a correct output file generated from the C++ version to match the improved version to.
3 pekerja bebas membida secara purata $200 untuk pekerjaan ini
Hi! I would like to work on this project. I can finish it over the coming weekend. Please note that my completion rate was lowered by an unscrupulous employer.