Let's say you're working for a researcher who's analyzing social networks on Web forums. They want to compare people who post answers vs lots of answers accepted by those who post questions. As a starting point, they would like a list that prints out the top 10 users by number of questions answered, and the top 10 by number of accepted answers. They've actually already written a program that generates this list, but it takes too long, uses too much memory, and generally just works poorly.
I've attached some sample code that does just this, as well as some sample data files for you. The data is from StackOverflow (technically from their affiliate site ServerFault). You can download and run this code against a larger data set here if you're interested.
[login to view URL]
The code that we've provided is very slow. It even has some bugs in it. There are versions in multiple languages -- pick one you're most comfortable with, read through it, ask us any questions if things are confusing, and make sure you understand it. Once you've done that, spend no more than two hours making it better. (Okay, you don't need a stopwatch, but you should be able to make some significant improvements in that time frame.) How you make it better is up to you.
When you're done, please reply with the following: your modified code (make sure it still works!), and a short write-up summarizing what you did and what you would do if you had more time. (In our evaluation, your write-up can be just as important as your code.) .
There are many substantial improvements that you can make to this code. It is not possible to implement all of them. So dig in; do what you can; show us your style. And don't forget to write about what you would do next!