Closed

Retrieve EDGAR exhibit data

Big idea is that I need to compute the number of subsidiaries reported in Exhibit 21within Form 10-K as stored in the EDGAR database (always Exhibit 21, mandated by statute). One approach is to utilize the idea that EDGAR contains separate HTM files for exhibits. Here is a link to a Microsoft filing that might help you see the idea:

[url removed, login to view]

IFFFF we can retrieve the HTM files, then we only need to strip the headers and whitespace, and compute the number of subsidiaries. An alternative approach considers the idea that the complete submission text file contains the exhibits as well. So, this route entails extracting Exhibit 21 data from the text file, then strip and count. Don't think REGEX is the best tool for HTML type data.

Approach is your call.

The final result should generate variables for:

-CIK (central index key).

-period end (conformed period of report, yyymmdd format preferred).

-filing date (date received at SEC, yyymmdd format preferred).

-form type (e.g., 10-K).

-company conformed name.

-the SIC number (standard industrial classification, 4 digits).

-the number of subsidiaries reported in Exhibit 21.

Prefer to use PERL software, with generous documentation to allow me to modify for future projects.

Please let me know if you have any questions.

Thank you!

If the approach is to access the HTM files, then the project includes the code necessary to generate the source feed (e.g., URLs, etc.). If the approach is to extract from the 10-K text file, then the project excludes the source feed code, as I have the 10-K text files.

Kemahiran: Perl, Kejuruteraan Perisian

Lihat lagi: regex is, one key data, python, web scraping, data mining

Tentang Majikan:
( 0 ulasan ) United States

ID Projek: #10098793

4 pekerja bebas membida secara purata $222 untuk pekerjaan ini

bob1982

Hi, I have great experience in website data extraction. i have done the extraction of many sites like [login to view URL],[login to view URL],[login to view URL],[login to view URL],[login to view URL],[login to view URL] and many more i have read th Lagi

$235 USD dalam 7 hari
(291 Ulasan)
6.8
idleswell

Thanks for your project. I am the premier Perl scripting expert on these freelancing sites. I will design a Perl script to emulate a browser accessing one of these filing documents and parse the HTML returned for th Lagi

$202 USD dalam 3 hari
(187 Ulasan)
6.1
dpune

Hi, I have more than 14 years of Data extraction and VBA/Excel exp and I am expert in this kind of work. Let us discuss more to review requirements. I have completed more than 270 projects. Please look at the fe Lagi

$250 USD dalam 10 hari
(57 Ulasan)
5.3
danilogbotelho

Hello, there! I don't do Perl but I can do it in Python. I have one question: You gave Microsoft filing as example. How do you intend to feed the script which company to scrape data for?

$200 USD dalam 10 hari
(5 Ulasan)
3.9