I need a multi threaded link checker, including source code (so I can make small changes) to check links (URLs) on a linux server.
Program will be console application, which can run 300+ threads simultaneously (I dont want memory leaks). It will check 500.000+ Urls in a text file which will be specified by command-line options.
It will check for some malicious HTML codes which will be loaded from for example: [url removed, login to view] file. If an URL is broken, or redirecting, or contains one of the [url removed, login to view], it will save it to a file: [url removed, login to view] or [url removed, login to view] or [url removed, login to view] (depending on the situation)
Bad html code check could be done with PCRE (Perl compatible regular expressions), that will make my job easier. Socket connections must be native and fast using ANSI C Socket commands, speed is important.