I need to scrape all my competitors websites on a daily bases, there are over 100+ websites.
My spiders use Regex to parse fields to extract only the data I require from tables where the structure of the table varies from page to page on the same website.
I also parse <p> tags with Regex to find and record a range of keywords.
I struggle with some sites to navigate from one page to the next as the site uses infinity scroll or other gimmicks to load new data.
Additionally I use auto replace on many fields that are extracted, I use a well defined list to check against and the correct replacement text.
Many sites are straight forward and are easy to extract the data (I am a novice) others require the use of Xpaths as the html tags do not seem to yield the desired data.
All data is out put to json file.
This project requires an experienced Python , Scrapy developer who can resolve all current requirements and is able to remain on a retainer to deal with any updates and new spiders required as and when the case may be.
If you feel you can meet all the requirements you are invited to discuss the project in greater detail.