I'm looking to build an URL scraper with the option of rotating http/https proxy IP addresses built in and the option to just use one proxy IP as well.
This scraper must fetch random URL's from a top million website excel spreadsheet and let me choose in lots of ie. 5,10,20,50, etc... on how many random URLs I want back. Also, want it to format If you can get one random internal page of the random URL scraped that would be great too. I don't need all the internal pages scraped from the URL, but you may want to keep this option open for future revision. So, what I'm saying is the program should be able to scrape not just the main URL but alternately some random URL's internal page address as well so it looks more natural. Understand? (shown below)
Example of random URLs as being scraped below:
1. [login to view URL]
2. [login to view URL]
3. [login to view URL]
4. http://etc...
5 etc...
------------------------
I have another output set of parameters to follow for exporting the data needed. Please PMB on this.
Let me know if you could automate it even more by fetching the random URL's meta Title, meta description, meta tags/keywords.
(meta title, meta description, meta keywords/tags) Please make these organized in cells with their corresponding URL so they can be exported below.
I would like export feature as well to popular documents -- excel csv, text notepad
Questions PM please.