Telah Dianugerahkan

Find URLs from websites

Populate an Excel sheet with the URLs of staff pages from a list of University websites.

To identify the XPath to various elements in a page, one of the tools that can be used is the XPathChecker plugin in Firefox ([url removed, login to view]).

The first step in creating a template is to identify the start page for each institute/organization. This start URL is added to the StartURL field in the institutes table. In most cases the list of staff members names is either a table or a list. The XPath to identify this table or list is then added to the TableXPath field in the corresponding record. The XPath to identify each staff member’s profile page link is added to the URLXPath field. Since most web profiles will be linked using a relative URL, the URLXPath based link needs to be combined with a URL prefix for the institute web server address and path. This is added to the URLPrefix field.

Once the StartURL, TableXPath, URLXPath and URLPrefix fields are populated, the script should be able to read the individual profile pages one by one. This can be verified by running the script and checking the output of the script on the screen to see whether the URLs are actually being retrieved.

Once the pages are able to be extracted, the template XPaths for the profile details need to be populated. The variables that are being captured include:

• Name

• Title

• Email

• Phone

• Fax

• Address

• Biography

• Qualifications

• Research Interests

• Publications

Each of these details will require a separate XPath added to the template with an optional regular expression to eliminate unwanted formatting and HTML tags. Please note that not all organizational units/staff members will have all of these details. A few trial runs will need to be run to get the most optimal XPath that will capture the majority of the details. For each detail, there are two methods of using the XPath. One is to get the value as a list of XPath nodes (‘V’) and the other is to get the values found by the XPath as a string (‘S’). The type of return needs to be added to the corresponding type field in the table. If a regular expression is needed, the type would usually be ‘S’.

More details will be posted in the coming weeks.

Kemahiran: Pemasukan Data, Excel, Perl, Pengikisan Web, Carian Web

Lihat lebih lanjut: xpath and or, using regular expression, the institutes, table checker, string prefix, regular expression using, regular expression a, regular expression 0, prefix string, prefix of a string, name and address template excel, formatting websites, find the staff, find perl, find a prefix, data entry staff needed, web expression 4, expression web 4.0, staff org or, university list research, html checker, firefox addon, find the fax no, find phone, find address

Tentang Majikan:
( 0 ulasan ) Sydney, Australia

ID Projek: #4066847

16 pekerja bebas membida secara purata $157 untuk pekerjaan ini

SigmaVisual

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

$199 AUD dalam 7 hari
(68 Ulasan)
6.8
appwiz

Good day, please see my message

$150 AUD dalam 7 hari
(9 Ulasan)
4.5
ebrainindia

Can be done very well. Have done this many time. Please see private message for proposal

$250 AUD dalam 3 hari
(16 Ulasan)
4.2
SoftSandila

i have done this work many times its quiet easy task for me....regards:R1

$130 AUD dalam 7 hari
(9 Ulasan)
3.9
hesama110

hi please check your PMB

$110 AUD dalam 3 hari
(1 Ulasan)
1.9
vyastik

Dear sir, I'm an experienced Web researcher and am eager to complete this job properly and in time.

$140 AUD dalam 9 hari
(1 Ulasan)
1.1
threeeyedata

Please see PMB We have experience team to do this job... [url removed, login to view]

$150 AUD dalam sehari
(2 Ulasan)
0.2
usmanfaisal3

Hello Sir, I 'm Faisal I will do my best for your project. And will deliver your completed project with a very short period. I have great team to done your task before time I am proficient in ms-word, ms-excel, dat Lagi

$200 AUD dalam 3 hari
(0 Ulasan)
0.0
ksharpvw

some Details ?

$150 AUD dalam 2 hari
(0 Ulasan)
0.0
toKandarp

Can be done very well. Have done this many time.

$150 AUD dalam 10 hari
(0 Ulasan)
0.0
ramjay03

I know this process well,give me I will do it successfully with good quality

$150 AUD dalam 20 hari
(0 Ulasan)
0.0
dawnconsultancy

Lets get started.

$250 AUD dalam 3 hari
(0 Ulasan)
0.0
ITjobs76

ready for your work.

$30 AUD dalam 10 hari
(0 Ulasan)
0.0
Uma829

i am interested in working with you

$100 AUD dalam sehari
(0 Ulasan)
0.0
Obxide

Ho, please engage me, give me this chance. Thanks

$200 AUD dalam 5 hari
(0 Ulasan)
0.0
tarasprystavskyj

Simple html DOM is better than Xpath for dirrect search of info on html page.

$160 AUD dalam 3 hari
(0 Ulasan)
0.0