Dibatalkan

Creation of a Blog Crawler

Project Brief: Blog Crawler Tool

To create a tool that can analyse a multiple number of blog URLs (from a .txt document of a .csv/.xls spreadsheet) and extract all the outgoing blog links, noting them down and returns the following information.

All outgoing links pointing to other blog URLs. It is vitally important that these are blogs and not normal websites – the tool will need to be able to not just take down every URL we are looking for the outgoing links not on every part of the blog but rather the piece that says "favourite Blogs" or "Related Blogs" rather than all of the hyperlink embedded within the text as deep links. The tool will need to be able to register this and select hyperlinks accordingly. It is normally the case that these links are usually submitted on the first page and replicated onto other pages within the blog. The tool should also be able to do the following:

Removal of any duplication URLs

Only inclusion of homepage URLs – as opposed to individual blog posts

Google PageRank of each blog URL found(there are constrictions on the number of requests that can be made per day for Google PageRank scores. Therefore, the tool will have to be able to utilise different proxy addresses to circumnavigate this problem)

Technorati Authority of each blog URL (there are constrictions on the number of requests that can be made per day for Technorati Authority scores. Therefore, the tool will have to be able to utilise different proxy addresses to circumnavigate this problem)

Blog Title - This is contained in the Meta Data of almost every blog as mentioned earlier you are aware of sourcing this data of which needs to be added into the tool – if this isn’t available in the meta data the tool should accommodate for this so that the title can be extracted.

Blog Description - This is contained in the source code of almost every website – if this is not available within the source code the tool should find this information from the “About” section – very common in blogs

Blog Keywords - This is contained in the source code of almost every website, for SEO purposes – from our earlier conversation - for the last point we would realistically need the title and description of the blog - this information can normall be found on the blog as text, we also require the keywords so that the blog can be catagorised by subject. This could be achieved in the following way then....either....The tool writes down all the tags from all of the posts and remover the duplicates or it picks out the meta tags (or both)

Tool Process

Input the proxies into a appropriately titled .txt file

Input multiple URLs into a .txt file

Blog Tool goes through each URL one-by-one finds all of the URLs found and for each URL found populates an excel in the following way:

Kemahiran:

Lihat lebih lanjut: blog crawler script, blog crawler, website creation google, process website creation, sourcing google, excel blog, could of, blogs blog, blog pages, blog information, blog crawler technorati, crawler code blogs, free blog crawler software, blog crawler code, google xls, blog blogs, website sourcing, website crawler, seo data removal, remover, pointing, picks, it utilise, google blog, find url website

Tentang Majikan:
( 42 ulasan ) London, United Kingdom

ID Projek: #297926

10 pekerja bebas membida secara purata $588 untuk pekerjaan ini

excelence

i can help you for a fair budget,thanks

$750 USD dalam 0 hari
(163 Ulasan)
6.4
omsoftware

Hello, we udnerstand your project and we are able to do it , we have Exp. in Crawling . Lets Dsicuss more , waiting for your reply raj

$750 USD dalam 15 hari
(213 Ulasan)
5.4
pgcoding

Crawling experts are here. Please check pmb.

$700 USD dalam 5 hari
(56 Ulasan)
5.1
CaaamSoftware

Please check PM.

$500 USD dalam 10 hari
(2 Ulasan)
1.3
ddsuresh

I can provide the best solution to you on given time frame. Thanks, Suresh

$300 USD dalam 7 hari
(0 Ulasan)
0.0
mlani101

i am a 5 year experienced in web development project, trust me, dis proj works well, please post PM to let me know.THNX

$400 USD dalam 15 hari
(0 Ulasan)
0.0
shuhongzheng

I'm good at this and can show you some script examples which are similarly matching your current requirement.

$480 USD dalam 7 hari
(2 Ulasan)
0.0
bmunjal

PreetSoft Infotech is a professional website development, hosting and design company. We can put your business on the World Wide Web, establishing a 24-hour-a-day, 365-days-a-year storefront or advertisement. Website Lagi

$500 USD dalam 10 hari
(0 Ulasan)
0.0
dsenthilkumar

I already many crawlers from basic Site Monitor to Product Crawlers. I can complete your Blog Crawler tool as best. Thanks, D SENTHILKUMAR.

$750 USD dalam 7 hari
(0 Ulasan)
0.0
headvances

Hi, We have a crawler that currently track few hundreds thousands of blog. Please check PM

$750 USD dalam 30 hari
(0 Ulasan)
0.0