URL & Email Scraping Program Needed
Must run on Windows XP & Vista
1. Enter a Keyword or words.
2. Program searches for keyword or words that are ONLY present in the meta tag section of websites.
3. Once the program finishes collecting the home page URL (not subpages) of the keyword or words, or reaches 20,000 home page URLs collected with the keyword or words in their meta tag section, the program must save the URL list after removing any duplicate URLs.
We want the Home page URLs ONLY that have the keyword or words in the meta tag section of that URL.
1. The program loads the saved URL list and scrapes all email addresses from each individual URL domain and any subpages of that domain only.
2. The scraping must not follow any links away from the original searched domain.
3. When all emails have been scraped, the program must remove any duplicates or invalid email addresses.
4. A filter section must allow us to add search filters to remove email addresses with any data we type into the filter section.
5. Save the clean email list.