Big idea is that I need to compute the number of subsidiaries reported in Exhibit 21within Form 10-K as stored in the EDGAR database (always Exhibit 21, mandated by statute). One approach is to utilize the idea that EDGAR contains separate HTM files for exhibits. Here is a link to a Microsoft filing that might help you see the idea:
[url removed, login to view]
IFFFF we can retrieve the HTM files, then we only need to strip the headers and whitespace, and compute the number of subsidiaries. An alternative approach considers the idea that the complete submission text file contains the exhibits as well. So, this route entails extracting Exhibit 21 data from the text file, then strip and count. Don't think REGEX is the best tool for HTML type data.
Approach is your call.
The final result should generate variables for:
-CIK (central index key).
-period end (conformed period of report, yyymmdd format preferred).
-filing date (date received at SEC, yyymmdd format preferred).
-form type (e.g., 10-K).
-company conformed name.
-the SIC number (standard industrial classification, 4 digits).
-the number of subsidiaries reported in Exhibit 21.
Prefer to use PERL software, with generous documentation to allow me to modify for future projects.
Please let me know if you have any questions.
If the approach is to access the HTM files, then the project includes the code necessary to generate the source feed (e.g., URLs, etc.). If the approach is to extract from the 10-K text file, then the project excludes the source feed code, as I have the 10-K text files.
4 pekerja bebas membida secara purata $222 untuk pekerjaan ini
Hi, I have great experience in website data extraction. i have done the extraction of many sites like [login to view URL],[login to view URL],[login to view URL],[login to view URL],[login to view URL],[login to view URL] and many more i have read th Lagi
Hello, there! I don't do Perl but I can do it in Python. I have one question: You gave Microsoft filing as example. How do you intend to feed the script which company to scrape data for?