Sedang Disiapkan

Data collection program

I have a need to harvest data from a web site on a weekly basis and need a program to do the work. I currently do it with a program I wrote but I need something more robust in order to run this more often.

The project is simple: download some html files and then extract the data in them and put them in a standard delimited ASCII file.

The program will simply query a web site to get the html pages that are available. There are about 100 per calendar date and the need is to be able to download the files starting with the current date and going to an end date. This means downloading between 35,000 and 100,000 html files each time the program runs.

The program should begin by getting the html files and storing them in a temporary folder.

The second step of the process is to parse the html files and extract the data contained in them. The HTML files keep the same format and are easy to extract as they are simple lists. ALL fields must be extracted.

The list have header information, so the header information must be repeated on each record of data created to ensure that the information stays together. For example, it will give the name and then a list of all the clients for that name.

The extracted data must be saved in a standard ASCII file format where each field is delimited by a character to be configurable in the program (example a tab). I will then take this data and import it into a database system for processing.

I do not need a program with a fancy user interface. It needs to be simple and functional. It must work on Windows XP Pro.

Attached is a ZIP file containing samples of the html files as well as the web site information.

The provider must submit a final program in executable format with all necessary files, and also the source code. He must have tested the program and must submit the results for one run between two dates, of a year time period (example: the provider can run the program on february 1st 2009 and put the end date february 1st 2010. The data collected must be submited to show the program works.).

If the provider does a good job on this, there are several other similar projects available for him. In the future when the format of these html files changes, I will ask the provider to modify the program.

This is a simple project but please only bid if you have done this type of work before and are sure you can deliver the work. I do not want to waste your time or mine. If you have questions please message me before you bid.

Thank you

Question asked: why keep the html files, why not just parse them and create the resulting output file.
Answer: The HTML files are kept in a temporary directory because they must be saved as backups in case the data is damaged in the future and needs to be re-parsed.

Payment for this project will be made via escrow only.

The final output should be in ONE delimited ASCII text file. All fields from all html files that are downloaded should be included on each record. The variation between the 3 types of html files in this project are minor so each record line may have about 30 fields, with some used or unused. The field is simply left blank if it is unused.

The program is to be run manually. It will not be run by MS Scheduler or other automated tool. No fancy automated feature is necessary, just the ability to specify a date range, specify the delimiter charachter, specify an output file path and name, and start and stop buttons. The program should also have a small option to play a sound file when the process is done, and another option to shutdown the computer after the process is done.

"We'll charge you $30 per 200 entries"
Bids with comments like this are unprofessional and will be ignored. This defeats the purpose of using GAF.

Kemahiran: .NET, Java, Javascript, PHP, Visual Basic

Lihat lagi: work collection, waste pro, temporary job, source code program php, program site web, php program download, one harvest, java web projects with source code, html program code, download c++ program, bid samples format, bid format samples, 1st source, data processing companies list, final data, does it works work, some program, parse an html, modify calendar, harvest, downloading data, Database collection, data query, data mine, data harvest

Tentang Majikan:
( 25 ulasan ) Montreal, Canada

ID Projek: #364777

Dianugerahkan kepada:


Hi! Please check PMB for demo.

$30 USD dalam sehari
(3 Ulasan)

43 pekerja bebas membida secara purata $121 untuk pekerjaan ini


We can help in your project, please check PMB to see our related experience.

$250 USD dalam 4 hari
(261 Ulasan)

Hi, More info is in the PM. Best Regards, Yousef

$245 USD dalam 3 hari
(70 Ulasan)

I am an expert in such tasks. Ready to start right now and finish as soon as possible. My bid is for fast professional service exciting my customers. Please contact in PMB to discuss details. Best Regards, Zeke

$250 USD dalam 2 hari
(171 Ulasan)

Hello, please refer your PMB. Thank you.

$200 USD dalam 5 hari
(83 Ulasan)

Hello,Please refer your [url removed, login to view] you.

$100 USD dalam 3 hari
(98 Ulasan)

i like this kind of job, will do it easily.

$100 USD dalam sehari
(34 Ulasan)
$250 USD dalam 0 hari
(85 Ulasan)

See private message.

$90 USD dalam 3 hari
(56 Ulasan)

I can do this job for you. See PM for details.

$80 USD dalam 2 hari
(146 Ulasan)

We are ready to do this project

$40 USD dalam 6 hari
(65 Ulasan)

Hi, I am currently working on a scrapper project which is quite similar to this one. I will do this using C#. I am an expert with data processing and text extraction, that is my field of work. I would like t Lagi

$100 USD dalam 8 hari
(74 Ulasan)

I've done a lot similar projects. I have special modules in Python. PyCurl+MultiThreads (with errors reprocessing).

$50 USD dalam sehari
(12 Ulasan)

Very interested in your data collection project. Please check your PMB. Thanks.

$150 USD dalam 3 hari
(19 Ulasan)

I worked on many similar scraping projects before. I'm a professional scrapper working in C#, C++, php. I can finish and deliver the program in a fastest possible time.

$100 USD dalam 2 hari
(12 Ulasan)

Please see PMB for details

$200 USD dalam 3 hari
(19 Ulasan)

Hi there, I am a expert data extractor, I have been doing it for over 11 years. I have completed many tasks both on GAF and other sites including extracting info from websites and other places. Please see my r Lagi

$220 USD dalam 2 hari
(14 Ulasan)

Hello, Please Check PMB

$70 USD dalam 2 hari
(10 Ulasan)

Please refer to PMB

$200 USD dalam 3 hari
(4 Ulasan)

hi,please check your pm.

$100 USD dalam 2 hari
(7 Ulasan)

I have a ready to go software for you which saves data in csv, text(ascii) and few other formats as well. It will help you to extensive extract data directly from the website without first extracting the html files. Bu Lagi

$150 USD dalam 2 hari
(2 Ulasan)