import requests
import pandas as pd
import zipfile
pd.set_option('display.max_columns', None)
Problem: The 13F files does not contain TICKER SYMBOL which acts as standard identifers for the asset.
Solve: SEC contains other (not 13F) datasets that have TICKER SYMBOLS and CUSIPS. CUSIPS are avaiabile in the 13F files so we can join them together.
"Fails-to Deliver" dataset contains SYMBOL AND CUSIPS.
On SEC, the file name and download links have a discerable pattern so lets organize a table with 2 columns: name, url via a dataframe or excel.
Download the files into a folder.
Loop through each file, unzip it (using the package ZipFile), and concatenate it into a single dataframe.
urldf = pd.read_excel(r'C:\Users\User\Downloads\sec1.xlsx')
urldf=urldf[['name','url']]
urldf.head()
| name | url | |
|---|---|---|
| 0 | 202412a | https://www.sec.gov/files/data/fails-deliver-d... |
| 1 | 202412b | https://www.sec.gov/files/data/fails-deliver-d... |
| 2 | 202411a | https://www.sec.gov/files/data/fails-deliver-d... |
| 3 | 202411b | https://www.sec.gov/files/data/fails-deliver-d... |
| 4 | 202410a | https://www.sec.gov/files/data/fails-deliver-d... |
Loop through each item:
Download and unzip the data
Append the data onto a dataframe called symboldf
symboldf=pd.DataFrame()
for x in range(len(urldf)):
with zipfile.ZipFile(f'C:\\Users\\User\\Downloads\\sec_fail\\{urldf["name"][x]}.zip', 'r') as my_zip:
with my_zip.open(my_zip.namelist()[0]) as file:
content = file.read()
data = [row.split('|') for row in content.decode().splitlines()]
df = pd.DataFrame(data[1:], columns=data[0])
symboldf = pd.concat([symboldf, df], ignore_index=True)
symboldf.head()
| SETTLEMENT DATE | CUSIP | SYMBOL | QUANTITY (FAILS) | DESCRIPTION | PRICE | |
|---|---|---|---|---|---|---|
| 0 | 20241202 | B38564108 | CMBT | 165 | CMB.TECH NV (BEL) | 11.22 |
| 1 | 20241202 | B5950S113 | MDXH | 531 | MDXHEALTH SA SHS NEW(BELGIUM) | 1.99 |
| 2 | 20241202 | D1668R123 | MBGAF | 324 | MERCEDES BENZ GROUP AG COMMON | 55.98 |
| 3 | 20241202 | G0085J117 | ADSE | 3333 | ADS TEC ENERGY PLC | 14.10 |
| 4 | 20241202 | G0131Y100 | ANSC | 900 | AGRICULTURE & NAT SOLUTIONS AC | 10.42 |
len(symboldf)
1285112
We have all the Fail to Deliver data. All 1.2m rows of it.
Remove duplicates and only keep the relevant columns, CUSIP and SYMBOL.
symboldf = symboldf[['CUSIP','SYMBOL']].drop_duplicates(subset=['CUSIP'], keep='first')
len(symboldf)
18189
Download the 13F quarterly data (Mar, June, Sept, Dec) from SEC
Each quarterly file contains 8 TSV files
Study SEC data dictionary for info on each file
For this excersice, I use the SUBMISSION, COVERPAGE & INFOTABLE.
Join each using the function pd.merge on the primary identifier 'ACCESSION_NUMBER'
Example:
sub_cover_june = pd.merge(sub_june, cover_june, on='ACCESSION_NUMBER', how='inner')
sub_cover_it_june = pd.merge(sub_cover_june, it_june, on='ACCESSION_NUMBER', how='inner')
x = pd.concat([sub_cover_it_dec,
sub_cover_it_sept,
sub_cover_it_june,
sub_cover_it_march], ignore_index=True)
x.head()
| ACCESSION_NUMBER | FILING_DATE | SUBMISSIONTYPE | CIK | PERIODOFREPORT | REPORTCALENDARORQUARTER | ISAMENDMENT | AMENDMENTNO | AMENDMENTTYPE | CONFDENIEDEXPIRED | DATEDENIEDEXPIRED | DATEREPORTED | REASONFORNONCONFIDENTIALITY | FILINGMANAGER_NAME | FILINGMANAGER_STREET1 | FILINGMANAGER_STREET2 | FILINGMANAGER_CITY | FILINGMANAGER_STATEORCOUNTRY | FILINGMANAGER_ZIPCODE | REPORTTYPE | FORM13FFILENUMBER | CRDNUMBER | SECFILENUMBER | PROVIDEINFOFORINSTRUCTION5 | ADDITIONALINFORMATION | INFOTABLE_SK | NAMEOFISSUER | TITLEOFCLASS | CUSIP | FIGI | VALUE | SSHPRNAMT | SSHPRNAMTTYPE | PUTCALL | INVESTMENTDISCRETION | OTHERMANAGER | VOTING_AUTH_SOLE | VOTING_AUTH_SHARED | VOTING_AUTH_NONE | source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392702 | Philip Morris Intl | COM | 718172109 | NaN | 1434813 | 11922 | SH | NaN | SOLE | 0 | 11922 | 0 | 0 | dec2024 |
| 1 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392703 | COM | 72352L106 | NaN | 78875360 | 2719840 | SH | NaN | DFND | 01 | 2719840 | 0 | 0 | dec2024 | |
| 2 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392704 | COM | 72352L106 | NaN | 155453253 | 5360457 | SH | NaN | DFND | 02 | 5248890 | 0 | 111567 | dec2024 | |
| 3 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392705 | COM | 72352L106 | NaN | 4197837 | 144753 | SH | NaN | DFND | 03 | 144753 | 0 | 0 | dec2024 | |
| 4 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392706 | COM | 72352L106 | NaN | 20090591 | 692779 | SH | NaN | SOLE | 0 | 692779 | 0 | 0 | dec2024 |
len(x)
12869383
Recap:
We have full year 13F data aggregated in dataframe x
We have CUSIP and SYMBOL info in the dataframe symboldf
The last step is merging the two data together.
Finally, we have a dataset of all holdings filed with the SEC with the SYMBOL column which provides a unified and simple way to query the underlying assets.
x = pd.merge(x, symboldf, on='CUSIP', how='inner')
x[:10]
| ACCESSION_NUMBER | FILING_DATE | SUBMISSIONTYPE | CIK | PERIODOFREPORT | REPORTCALENDARORQUARTER | ISAMENDMENT | AMENDMENTNO | AMENDMENTTYPE | CONFDENIEDEXPIRED | DATEDENIEDEXPIRED | DATEREPORTED | REASONFORNONCONFIDENTIALITY | FILINGMANAGER_NAME | FILINGMANAGER_STREET1 | FILINGMANAGER_STREET2 | FILINGMANAGER_CITY | FILINGMANAGER_STATEORCOUNTRY | FILINGMANAGER_ZIPCODE | REPORTTYPE | FORM13FFILENUMBER | CRDNUMBER | SECFILENUMBER | PROVIDEINFOFORINSTRUCTION5 | ADDITIONALINFORMATION | INFOTABLE_SK | NAMEOFISSUER | TITLEOFCLASS | CUSIP | FIGI | VALUE | SSHPRNAMT | SSHPRNAMTTYPE | PUTCALL | INVESTMENTDISCRETION | OTHERMANAGER | VOTING_AUTH_SOLE | VOTING_AUTH_SHARED | VOTING_AUTH_NONE | source | SYMBOL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001088875-25-000002 | 31-JAN-2025 | 13F-HR | 1088875 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BAILLIE GIFFORD & CO | CALTON SQUARE | 1 GREENSIDE ROW | EDINBURGH | X0 | EH13AN | 13F HOLDINGS REPORT | 028-04825 | NaN | NaN | N | NaN | 109392702 | Philip Morris Intl | COM | 718172109 | NaN | 1434813 | 11922 | SH | NaN | SOLE | 0 | 11922 | 0 | 0 | dec2024 | PM |
| 1 | 0001957363-25-000001 | 31-JAN-2025 | 13F-HR | 1957363 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Creative Capital Management Investments LLC | 6265 GREENWICH DRIVE | SUITE 201 | SAN DIEGO | CA | 92122 | 13F HOLDINGS REPORT | 028-22724 | 286567.0 | 801-286567 | N | NaN | 109393511 | PHILIP MORRIS INTL INC | Stock | 718172109 | NaN | 6619 | 55 | SH | NaN | SOLE | NaN | 0 | 0 | 55 | dec2024 | PM |
| 2 | 0001706836-25-000002 | 31-JAN-2025 | 13F-HR | 1706836 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BFSG, LLC | 2040 MAIN STREET | SUITE 720 | IRVINE | CA | 92614 | 13F HOLDINGS REPORT | 028-18081 | 143617.0 | 801-67796 | N | NaN | 109450548 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 179563 | 1492 | SH | NaN | SOLE | NaN | 0 | 0 | 1492 | dec2024 | PM |
| 3 | 0002007263-25-000003 | 31-JAN-2025 | 13F-HR | 2007263 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | BKM Wealth Management, LLC | 20300 WATER TOWER BLVD. STE. 250 | NaN | BROOKFIELD | WI | 53045 | 13F HOLDINGS REPORT | 028-23494 | 323928.0 | 801-127026 | N | NaN | 109451892 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 298347 | 2479 | SH | NaN | SOLE | NaN | 0 | 0 | 2479 | dec2024 | PM |
| 4 | 0001172661-25-000454 | 31-JAN-2025 | 13F-HR | 1331074 | 31-DEC-2024 | 31-DEC-2024 | N | NaN | NaN | NaN | NaN | NaN | NaN | MCDONALD PARTNERS LLC | 1301 East 9th Street | Suite 3700 | Cleveland | OH | 44114 | 13F HOLDINGS REPORT | 028-21003 | 135414.0 | 801-64294 | N | NaN | 109454101 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 866986 | 7203 | SH | NaN | SOLE | NaN | 0 | 0 | 7203 | dec2024 | PM |
| 5 | 0001580642-25-000642 | 31-JAN-2025 | 13F-HR | 1749744 | 31-DEC-2024 | 31-DEC-2024 | N | NaN | NaN | NaN | NaN | NaN | NaN | Global Retirement Partners, LLC | 4340 Redwood Highway | Suite B-60 | San Rafael | CA | 94903 | 13F HOLDINGS REPORT | 028-18865 | NaN | NaN | N | NaN | 109457225 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 4945128 | 41090 | SH | NaN | SOLE | NaN | 474 | 0 | 40616 | dec2024 | PM |
| 6 | 0001420506-25-000102 | 31-JAN-2025 | 13F-HR | 1801720 | 31-DEC-2024 | 31-DEC-2024 | N | NaN | NaN | NaN | NaN | NaN | NaN | Key Financial Inc | 1045 ANDREW DRIVE | SUITE A | WEST CHESTER | PA | 19380 | 13F HOLDINGS REPORT | 028-19846 | NaN | NaN | N | NaN | 109462225 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 325787 | 2707 | SH | NaN | SOLE | NaN | 0 | 0 | 2707 | dec2024 | PM |
| 7 | 0001580642-25-000637 | 31-JAN-2025 | 13F-HR | 1456133 | 31-DEC-2024 | 31-DEC-2024 | N | NaN | NaN | NaN | NaN | NaN | NaN | Convergence Investment Partners, LLC | 3801 Pga Blvd. | Suite 1001 | Palm Beach Gardens | FL | 33410 | 13F HOLDINGS REPORT | 028-13404 | NaN | NaN | N | NaN | 109464080 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 1015979 | 8442 | SH | NaN | SOLE | NaN | 7016 | 0 | 1426 | dec2024 | PM |
| 8 | 0001140361-25-002693 | 31-JAN-2025 | 13F-HR | 1395055 | 31-DEC-2024 | 31-DEC-2024 | N | NaN | NaN | NaN | NaN | NaN | NaN | Callahan Advisors, LLC | 3555 Timmons Lane, Suite 600 | NaN | Houston | TX | 77027 | 13F HOLDINGS REPORT | 028-12463 | 114058.0 | 801-61981 | N | NaN | 109464273 | Philip Morris Intl Inc | COM | 718172109 | NaN | 1171450 | 9734 | SH | NaN | SOLE | NaN | 7234 | 0 | 2500 | dec2024 | PM |
| 9 | 0001911391-25-000004 | 31-JAN-2025 | 13F-HR | 1940033 | 31-DEC-2024 | 31-DEC-2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Plan Group Financial, LLC | 6 NE 63RD STREET | NaN | OKLAHOMA CITY | OK | 73105 | 13F HOLDINGS REPORT | 028-22609 | NaN | NaN | N | NaN | 109464652 | PHILIP MORRIS INTL INC | COM | 718172109 | NaN | 1528901 | 12704 | SH | NaN | SOLE | 0 | 0 | 0 | 12704 | dec2024 | PM |