Sedang Disiapkan

Spider to extract yellow pages data and put into CSV formatted file

1. I must be able to set the starting URL from which the spider will intitiate from on the [url removed, login to view] or [url removed, login to view] websites. The [url removed, login to view] website is powered by [url removed, login to view] and is in the same format but I find the navigation to find various categories easier with 411.com.

For example I might paste the following URL into the spider utility: [url removed, login to view];C=jewelers&R=N&STYPE=S&MC=1&OO=1&F=1&CP=Clothing+%26+Accessories%5EJewelry%5EJewelers%5E

2. Once the starting URL has been entered, the spider must parse the HTML and extract the business name, city, state, zipcode, telephone number, fax number (if applicable), email address (if applicable), and website (if applicable) into a CSV formatted text file.

3. Spider must crawl through each of the pages until the final page for that category is completed. However, at the very beginning of most categories, there are businesses listed under the "Yellow Pages - Advertisers" heading. These are businesses that are not from the area that I have chosen (for example I chose Alaska and they are from California, etc.) but are advertising in that area. I do not want these entries included. I would want the ones that start under the "Yellow Pages - Listings" heading. The spider does not neccessarily need to know how my list was created, only to avoid entries under the "Advertisers" section.

4. When completed, an update function that lets me name a new file to save the data to or lets me choose an exisiting .CSV file to append the new data to.

5. Search and purge function that can be run anytime on any of the .CSV files that have been created to ensure no two entires have the same telephone number in a specific .CSV file. If duplicates telephone numbers are found, records with the least information are automatically deleted. For example, 2 records with the same telephone numbers but one lists a fax and the other doesn't, then delete the one without the fax number.

6. Merge function that can be run any time and lets me pick 2 or more .CSV created files and merge them into one new file. If more than 2 files is a problem, I can live with 2 and merge a few times to create one file.

7. Finally, I will provide you with 2 URL's which will represent 2 different yellow page categories on the [url removed, login to view] website and you will run the completed program and email me (or make available to download), 2 .CSV files with the completed and duplicate purged files.

My Requirements:

1. You will be easily contacted. Either by phone, or you will be required to answer any e-mail I send to you within 10 hours time.

2. Must speak and write english well.

3. Code must be well commented in english.

4. All source code must be given to me.

5. I would prefer if this was written in Java.

6. I would like this done by no later than March 10th, 2006.

7. Must be able to run on my Pentium III with Windows XP. I am in a very rural area and only have dial up. I am running Java 2 Platform Standard Edition Version 1.5.0 (build 1.5.0_04-b05)

8. Delivery of files will be via email for sure and possibly by FTP.

Kemahiran: Java

Lihat lebih lanjut: yellow pages csv file, yellow pages csv, extract yellowpages, yellow pages csv format, yellow page spider, extract yellow pages, yellow pages spider, csv yellow pages, crawl yellow pages data, extract yellow page, spider csv, java extract data yellow pages, spider extract, extract categories yellow pages data, parse yellow pages csv, extract info yellow pages csv, superpages csv file, list yellow pages data spider, extract data yellow pages csv file, spider extract name product, superpages spider, yellow pages categories csv, spider yellow pages, parse yellowpages, extract yellow pages address

Tentang Majikan:
( 11 ulasan ) Charlottetown, Canada

ID Projek: #47387

Dianugerahkan kepada:

pieby2

Java Swing GUI desktop with functions: search (start, stop, pause, resume, progress, save, …), clean (for duplicates) and merge (2 at a time). Will handle both [url removed, login to view] and SuperPages current layouts. E-mail address Lagi

$200 USD dalam 6 hari
(4 Ulasan)
4.4

9 pekerja bebas membida secara purata $220 untuk pekerjaan ini

cliver

Hello, Please look at the PMB. Thanks, Sergey

$300 USD dalam 2 hari
(4 Ulasan)
5.1
NishantBamb

Hello, please refer your PMB. Thanks.

$250 USD dalam 5 hari
(12 Ulasan)
4.7
cstl

Chandusoft is a customer-specific service oriented company has got an Professional and creative team with 6 years experience in Web design and development. We have expertise and experience in ecommerce site development Lagi

$240 USD dalam 10 hari
(1 Ulasan)
3.6
terexa012001

i work with this kind of parsing many times. Opensource HTML parser will be used. All things can be negotiatable. in case of questions, mail me at tdminh81[at][url removed, login to view]

$200 USD dalam 10 hari
(0 Ulasan)
0.0
HeoQue

I can do it using .NET , pm if you are [url removed, login to view]

$250 USD dalam 14 hari
(1 Ulasan)
0.0
manasm

Hello, We are experienced in java and crawler development. We alreayd developed crawler to crawl [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view] in java. We are interested to do this for you as it matches our expe Lagi

$250 USD dalam 6 hari
(0 Ulasan)
1.9
impetusoft

We are professional CUSTOM SOFTWARE DEVELOPMENT COMPANY which successfully operates on the IT market for over THREE YEARS. We provide our clients with the FULL RANGE OF RELATED SERVICES starting from project specificat Lagi

$190 USD dalam 9 hari
(0 Ulasan)
0.0
instance

GOOD QUALITY WORK WITH INTIME DELIVERY OF THE PRODUCT . 100% GUARANTEED OF HIGH QUALITY PROFESSIONAL WORK, AS WE ARE THE EXPERTISE IN JAVA/J2EE , JSP , EJB, ASP,PHP, STRUTS FRAME WORK RELATED PROJECTS. OUR COMPANY HA Lagi

$100 USD dalam sehari
(0 Ulasan)
0.0