Scrapping in Python

I have a Python/Django app, and I need to add the following functionality added to it (in Python):

- I want a script that will search for a given term (that I provide), and for each of the first 100 results in Google, crawl the website and look for an email address. If one is found, record it in my db.

- I have link exchange partners see ([url removed, login to view]). I need to have a Python script that crawls my partner pages and verifies that they have added a link to my site (and record the page the link is on).

- I have a db of a bunch of local businesses (35,000) and I want to add the following functionality:

- I am missing the web site for about half the businesses, I need to have a script search Google for the name of the accountant firm and find the URL for their site (if it exists). This should use some simple heuristics for each of the websites on the first page of the Google results, like is the business name in the H1 or title tag of the home page.

- I am missing email addresses for most of the businesses. I need a script to crawl the businesses web site and find an email address.

It is important that you speak very very good English, as communication is important.

If you have any questions, feel free to ask. Keep the following things in mind:

- These scripts will be run periodical (maybe daily) as part of a Django app on a Linux/Apache/MySQL machine.

- I expect the crawler you use to be multithreaded (or else these scripts will take way too long to run) and be polite to the host domain (no flooding them with requests, respect [url removed, login to view], etc).

- I expect high quality code with tests to verify. I'm a developer myself, and I will be reviewing all the code.



Kemahiran: MySQL, Python, Pengikisan Web, Destop Windows

Lihat lagi: www google com web developer, www accountant, websites for accountant, web page scrapping, web developer python, web developer django, web crawler developer, web app developer find, want to find a part in business, to be a good accountant, site to find app developer, search for accountant, questions to ask accountant, python find, python exchange, multithreaded web crawler, look for script developer, look for an accountant, local search script, i need to find a web developer

Tentang Majikan:
( 121 ulasan ) Belfast, Ireland

ID Projek: #689376

1 pekerja bebas membida secara purata $250 untuk pekerjaan ini


Hello, I can deliver the project. althought I'm not sure if 100% accuracy for the partners script can be achieved. I think it's would be best to integrate scripts as Django managment commands, so you could run them thr Lagi

$250 USD dalam 10 hari
(39 Ulasan)