• Build a web scraper that collects data from the financial times (FT) equity screener website, extracts the relevant data from html and writes this data into a structured cvs-file for download and processing in tabular form by Excel.
• The website URL is as follows: [login to view URL]
• The system should be based on the Scrapy web scraper framework. The spider files written in Python. ([login to view URL])
• The spider should run in fixed intervals, such as once per week. It shall be possible to set and change the frequency later on.
• It should be possible to run several spiders in parallel, each with a specific set of data attributes (e.g. market cap, ROI) collected. For each spider there shall be a specific Python file that can easily be replicated using the code of the initial spider.
• The attributes and target limits (e.g. market cap USD 100M – 1B) are to be set manually. It should be possible to add and delete specific attributes later on as well as change the corresponding target limits. All attributes and target limits are based upon the features of the FT website.
• What might be tricky is that the FT-website uses https. Also the attributes and data ranges cannot be set as parameters in the URL line. When using the website I have to enter the parameters by hand and then submit to get the results list.
• The system shall run on Amazon Web Services (AWS), maybe as an EC2 instance. The files with scraped data shall be stored within a bucket in AWS S3.
• Within the scope of the work shall be the programming of all code required and set up of the live system on AWS for a single initial spider. A login will be provided.
• The scope shall also include a 1-page documentation that describes the structure of the system and gives guidance as to how make the changes described above.
The first spider shall be as follows:
• Interval: Once per week every Thursday
• Website: [login to view URL]
• Attributes for screen and target limits: Countries (Europe – all, America – USA, Canada), Sectors (all), Market cap (USD 500M+), ROI 5 year, ROI current, ROE 5 year, ROE current, Net profit margin 5 year, P/B, P/E, Interest cover, Price change 52 weeks
• Data collected: All columns from the results list, all pages with results sorted alphabetically
19 pekerja bebas membida secara purata $417 untuk pekerjaan ini
hi, employer. i am a python expert. i have a good experience in web scrapping. i have a lot of previous scrapers. so if you award this project to me, i can complete it surely. i wish you will ping me asap. thanks.
Ready to start the work to develop the script for the scrapping to scrap the data from the other website , we can discuss more over chat,thanks regards Arjun S.
hello, sir. i read your proposal and understand all you need. Scrapy Spider is my favorite python library. if you hire me i will finish your task in time. thanks.