I have 2 DVDs with approximately 60,000 records of contact information, and I need to be able to extract data from them. I have the 2 DVDs as ISO files, but I could not tell you what kind of database is on them. I will describe the situation below, but I will need to share the ISOs with you (624 MB and 293 MB) before we can get started on the project.
DATA EXTRACTION PROBLEM:
This directory on DVD only allows export of 200 records at a time, of approximately 60,000 records of interest. Also, the email address is not available in every case, and is buried within another field. And, the Postal Code (ZIP code) is shown as a numeric rather than text value, where leading zeros are omitted and we don’t always have a 5-digit (or 5 digit plus 4 digit with hyphen) ZIP code.
I need to extract all 60,000 records of interest into one flat-file database, and move the email address (when available) into its own field within a record.
These DVDs also assigns each contact an ISLN number, which is a unique identifier for that person, which will not change when that person changes law firms or moves. That data is not currently part of the data that is exported with the export function, but can be obtained on a record-by-record labor-intensive operation of looking at the Detailed View of each contact. I would like to have that ISLN number be a part of the flat-file database, if feasible/practical, to compare future versions of the database to this one to see which contacts have changed their address.
HOW I'D DO IT MANUALLY:
Open the software. It starts on the Search Page. Click on the red tab that says “Advanced Search”. Find the field for “practice groups” and click on the folder icon to the right, click on “Family Law” and “OK”. Then Click “Search.”
Click on the tab at the top for Brief View. Look in the bottom right corner and it’ll say “Listing: 0 of 61517” meaning that 61,517 records matched the criterion of Family Law.
At the top of the page click File, then Export. Give the file a name like 200.txt. At the bottom of the box specify that the records to be extracted are 1 through 200. Then click Export and the file is created.
Click File, then Export again, with a name like [url removed, login to view], specifying records 201-400. Export.
Repeat 300 times. Then, cut and paste the contents of each spreadsheet into one master spreadsheet (without duplicating the headers), so you have a flat file with 60,000 records.
Then, write a macro to search for “@”, copy the entire word containing that symbol, and pasting it in a new column to the right of the entries for that record.
(now I don’t know if the rest of this is worth the effort and expense). Go back to each record in the Detailed View. Double click on the ISLN number, paste it into an Excel spreadsheet. Double click on the last name and paste it. Double click the first name and paste it. Double-click the zip code and paste it. Wind up with a spreadsheet with 60,000 entries of ISLN, lastname, firstname, and zipcode. Write a macro to match up this info with the info in the other spreadsheet to associate the ISLN with the appropriate record and make the ISLN part of the record.
In a few months do it all over again, and run a program to compare addresses of each ISLN to identify contact who are new to the database, or who have moved since the previous database.