I have a list of about 1.4 million URLs (Sites A) that need to be checked against another 1 millions sites (Sites B). Sites B are the authoritative sites that are from Alexa’s top 1 million sites.
I would like to have a program or script made that each of Sites A (1.4 million URLs) checked if it is listed in Sites B in a reasonable time (say, in less than 1 hour, but faster the better). The program can be made with: C, Ruby, Perl, or Python etc.
When the URL of one site in Sites A is listed in Sites B, I will need a flag be put up, and it must be saved as a text file along with the URL, one site per line. When the site in Sites A is not listed in Sites B, it will have a flag put down and it also be saved as a text file along with the URL.
Hi. I would like to work on your project. Im ready to create code without milestone and show you demo result to confirm my skills. Also before urls will be processed they need to be normalized.
4 pekerja bebas membida secara purata $40 untuk pekerjaan ini
Hi Sir/Madam, I'm expert in Python programming and I can help You with this script. Can You send me examples of those 2 files (I suppose they are too big to be send whole). Best regards, Fejs.