We want to get CSS selectors for a list of 870 websites, mainly news sites. The selectors should point to where the news articles, or things similar to news articles, are on the site and where the title, url and (optionally) content of the articles can be found. We need this to build a search index over the sites by following all the articles' links.
For this, we will give you a list of urls of the sites as Excel file. You should send back an Excel file that contains the CSS selectors for them. An example file is attached.
- You can read up on CSS selector at [login to view URL], for example.
- Basically the urls in our list contain a list some kind of thing: news articles, forum posts, items in a shop etc. You should build CSS selectors that point to the individual things, e.g. the separate news articles separated from one another, the separate forum posts etc.
- We plan to use the CSS selectors in a software, so all entries in the Excel file you send back have to be valid CSS selectors. If you do not find a CSS selector for a url, please leave the corresponding field empty. (Do *not* mark it with something like "?", "unclear" etc., because our software will not be able to understand such entries.)
- If you are unsure about how to handle some specific case, please ask.
- We need the results as soon as possible. It would be great if you could send back first results even before you've finished with the complete list.
- Probably the easiest way to find CSS selectors for a page is to use the "developer tools" of your browser. I've attached a screenshot of how they look like in Chrome. If you click on the symbol at very left of the top row shown in the screenshot, you can hover/click an element on the website and are shown the corresponding html elements at the bottom row.
The following steps explain how to fill out the columns of the Excel sheet:
1. Selector Article
Choose a selector that all articles share.
Example 1: On the page [login to view URL] (also see the attached screenshot), all articles have a surrounding tag <article class="article hp">. The full CSS selector for the leftmost article is "body > div.page_container > div.page_content > div > section > [login to view URL] > div:nth-child(1) > [login to view URL] > ul > li.item_32338249.item.hppos0.new.item_id > article", but we want a selector that *all* articles share, so "article" is the CSS selector that you should enter in the Excel file (without the "").
Example 2: On [login to view URL] there are several columns of articles that have different CSS selectors. Fortunately, selectors can be combined with , so the correct entry in the Excel file is "[login to view URL],[login to view URL] li" (without the "").
2. Selector URL
Choose a selector to the link element that links to the article's own page. This selector has to be relative to the article selector from step 1.
Example 1: For [login to view URL], this is ".article_content a" (without the "").
3. Selector Title
Choose a selector to something that could be used as a title for the article. If there is nothing better, point to the text of the link element from the step before. Again, the selector has to be relative to the selector from step 1.
Example 1: For [login to view URL], "article_content h2" or ".article_content a" would both be good. In that case you could choose one of them.
4. Selector Content (optional)
Choose a selector to the actual text / teaser text of the article. If there is no teaser text, or the teaser text is not in an html element below the selector from step 1, or simply if you are unsure about it, you can leave out this step.
41 pekerja bebas membida secara purata €486 untuk pekerjaan ini
I am sure that I can finish your project perfectly with high-quality code and short time. Please send me message so that we can discuss more. Thank you.
HI. I have read your job post. I have much experience Web scraping using PHP. If you hire me, I can complete your project perfectly. Please send me a message. Thank for your posting and reading. Regards.
Hello, client. I am very interested in your scraping job. Web Scraping is my main skill. [login to view URL] These were my past jobs. Thanks. Silver bead.