I want to download the content of a website but it's is a single page site (like instagram) and I cant just download an html file. We have to programmatically access the website through a browser (preferably in a headless environment like a Linux machine running in aws) and then use xpath to find the information we need and save this information somewhere (text files would be ok as long as the data is consistent).
Assuming we continue with our Instagram example, the objective is to take a snapshot of all the information in a public account. This can be split into several steps:
* Start by going to the main page of an Instagram user and get the number of posts, followers, following and bio/description
* Get the full list of followers (instagram ids). Note this requires a sub-window and scrolling down.
* Get the full list of following (instagram ids). Note this requires a sub-window and scrolling down.
* Then for each post get a link to the post, a link and maybe a thumbnail of the image, the number of likes, the number of comments, the list of hastags, the list of people who commented, the list of people mentioned and possibly the text in the post plus all the comments. Note that since this has to be done for all posts we will need to scroll down the page all the way to the end.
This is a side project for me and I will have to maintain the code so very simple and easy to read source code is preferred. With plenty of comments explaining why things are done. Please take that into account when making an offer. Unit tests would be a very nice addition, it would be great to include them (starting from the very obvious to more complicated cases).
A payment schedule will be established for each one of the steps, starting from the most basic of just getting the number of posts, number of followers, number of following and bio/description.
Please only apply if you are very familiar with this kind of work. I am software engineer myself and although I am not familiar with the details of web automation I do a very decent job reading java code.
19 pekerja bebas membida secara purata €190 untuk pekerjaan ini
hi , I have scrapped 100+ websites . Amazon , adidias and many more r among them . knk me so that we can discuss . I can do this using python not java , if language is not a problem for me then knk me up .