I need a program which will search the web starting with every domain, [url removed, login to view] [url removed, login to view] [url removed, login to view], [url removed, login to view], [url removed, login to view] ... [url removed, login to view] [url removed, login to view] etc. Incrementing one alphanumeric character until every domain has been searched for their [url removed, login to view] and [url removed, login to view] to a certain domain name length has been searched. I want to compile a list of all the classified sites on the internet. Not just newspapers, but buying and selling type classifieds. I want the list of classified sites outputed to a file. The output must be a CVS file with 1 field as the URL. 1 field as a the html page name. 1 field what version of classified software it is running, like phpboard or phpbb etc. You could make this either with linux and a shell script or make it for windows. Either way is fine. You must be able to set the delay between queries and timeout values. I would like to max it out on my computer and connection but without crashing up the computer. It should run for several days until it is done with all of them until the length i specify, ie 20 character long domains will try all of the alphanumeric characters and dashes. I must also be able to specify if it searching .com, .net, .us, .org, .in, etc.
It must be able to run on a console on a linux machine so options include c, c++, perl, python. and a few others. I am not sure which one would be the best and fasted choice. I DO NOT want a windows version of this. I want to run it on a linux server somewhere with a fast internet connection for the program to function as quickly as possible
To compile a complete list of all buying and selling classified sites, message boards, forums, and bulletin boards on the internet. NO blogs. If you goto google.com and search for "miami classifieds" or "free classifieds" those are good examples of the sites i want to gather with this program.
The program will do the following:
Increment alphabeticly and numericly one letter and number at a time to search for the presence of a index.html index.php index.asp or similar to detect if the website is a classified site or message board. It will use the characters a-z 0-9 and - for trying to find domains. It will start with a.com b.com c.com, d.com, e.com ... aa.com a1.com etc. It will attempt to identify the php, asp or cgi software and version the classified or message board site is running if possible. It will create a CVS file with the output. It will also append to a log file the current time, current action it is performing, which domain was the last or current one, any errors, and anything else important. If it can automaticly resume later that would be great.
field #1 - type of site - classified or forum
field #2 - URL of classified site
field #3 - title bar from header of page
field #4 - software type of classified or message board
field #5 - time and date searched
field #6 - alexis page ranking
the user input for the program will included:
start point - ie aaa2.com killooo-1.com so i can return after it left off
type of site to search - classified, message board or both
timeout period (default 10 seconds)- how long to wait for a resposne
concurrent connections - how many queries at once
domain length - max number of characters long to search for
output file name - (default current dir/$DATE$STARTTIME) output cvs file location and name
output log name - (default current dir/log) output log file, it will just append to it
top level domain - ie .com .net .us. .org etc
Most importantly the program will let me change with the timeout and concurrent connection count so i can set it at a rate which will almost max out the computer but not crash it.