I need a program to do the following:
1. User types in a search term, file extention and selects whether to crawl only within search result domain or to through every site linked to within tree
2. Code will then perform a google search on a search term that is input by the user
3. For each resulting URL, follow all links on every page from that URL down (only within the same domain as the search result or not as specified by the user) and record which page contains a given file type (and how many files of the file type) input by the user. As the program runs, progress statistics (how many pages crawled, # files found, etc) should be displayed.
3. Display every URL found with number of files of the filetype on the page in descending order by number of files of a the file type files on the page.
Although this could be implemented on server side, one concern is that the traffic that will be generated will exceed the limits imposed by the web hosting company. Thus, a PC side application could be written to do this would be preferrable or a server side script that could limit and "chunk" work to fit within bandwidth contraints might be acceptable also.