Other Data Providers

Note

You are reading Tidy Finance with Python. You can find the equivalent chapter for the sibling Tidy Finance with R here.

In the previous chapters, we introduced many ways to get financial data that researchers regularly use. We showed how to load data into Python from Yahoo!Finance (using the yfinance package) and from Kenneth French’s data library (using the pandas-datareader package). We also presented commonly used file types, such as comma-separated or Excel files. Then, we introduced remotely connecting to WRDS and downloading data from there. However, this is only a subset of the vast amounts of data available online these days.

In this chapter, we provide an overview of common alternative data providers for which direct access via Python packages exists. Such a list requires constant adjustments because both data providers and access methods change. However, we want to emphasize two main insights. First, the number of Python packages that provide access to (financial) data is large. Too large actually to survey here exhaustively. Instead, we can only cover the tip of the iceberg. Second, Python provides the functionalities to access any form of files or data available online. Thus, even if a desired data source does not come with a well-established Python package, chances are high that data can be retrieved by establishing your own API connection (using the Python requests package) or by scrapping the content.

In our non-exhaustive list below, we restrict ourselves to listing data sources accessed through easy-to-use Python packages. For further inspiration on potential data sources, we recommend reading the Awesome Quant curated list of insanely awesome libraries, packages, and resources for Quants (Quantitative Finance). In fact, the pandas-datareader package provides comprehensive access to a lot of databases, including some of those listed below.

Also, the requests library in Python provides a versatile and direct way to interact with APIs (Application Programming Interfaces) offered by various financial data providers. The package simplifies the process of making HTTP requests, handling authentication, and parsing the received data.

Apart from the list below, we want to advertise some amazing data compiled by others. First, there is Open Source Asset Pricing related to Chen and Zimmermann (2022). They provide return data for over 200 trading strategies with different time periods and specifications. The authors also provide signals and explanations of the factor construction. Moreover, in the same spirit, Global factor data provides the data related to Jensen2022b. They provide return data for characteristic-managed portfolios from around the world. The database includes factors for 153 characteristics in 13 themes, using data from 93 countries. Finally, we want to mention the TAQ data providing trades and quotes data for NYSE, Nasdaq, and regional exchanges, which is available via WRDS.

Source Description Packages
FRED The Federal Reserve Bank of St. Louis provides more than 818,000 US and international time series from 109 sources via the FRED API. Data can be browsed online on the FRED homepage. fredapi, pandas-datareader
ECB The European Central Bank’s Statistical Data Warehouse offers data on Euro area monetary policy, financial stability, and more. sdmx
OECD The OECD monitors trends, collects data, and forecasts economic development in various public policy areas. It’s a major source of comparable statistical, economic, and social data. requests, pandas-datareader
World Bank Provides access to global statistics including World Development Indicators, International Debt Statistics, and more, available for over 200 countries. requests, pandas-datareader
Eurostat Eurostat, the EU’s statistical office, provides high-quality data on Europe. requests, pandas-datareader
Econdb Offers access to economic data from over 90 official statistical agencies through a comprehensive database. requests, pandas-datareader
Bloomberg Provides data on balance sheets, income statements, cash flows, and more, with industry-specific data in various sectors. Paid subscription required. blpapi
LSEG Eikon Eikon offers real-time market data, news, analytics, and more. It’s a paid service of LSEG. refinitiv-data
Nasdaq Data Link Quandl publishes alternative data, with some requiring specific subscriptions. nasdaqdatalink, pandas-datareader
Simfin Automates data collection to offer financial data freely to investors, researchers, and students. simfin
PyAnomaly A Python library for asset pricing research, offering various portfolio analytics tools. pyanomaly
IEX Operates the IEX, offering US reference and market data, including intraday pricing data. pyEx, pandas-datareader
CoinMarketCap Offers cryptocurrency information, historical prices, and exchange listings. coinmarketcap
CoinGecko An alternative crypto data provider for current and historical coin and exchange data. pycoingecko
X (Twitter) Offers limited access for academic research on Tweets. tweepy
SEC company filings The EDGAR database provides public access to corporate and mutual fund information filed with the SEC. sec-api
Google Trends Provides public access to global search volumes via Google Trends. pytrends

Exercises

  1. Select one of the data sources in the table above and retrieve some data. Browse the homepage of the data provider or the package documentation to find inspiration on which type of data is available to you and how to download the data into your Python session.
  2. Generate summary statistics of the data you retrieved and provide some useful visualization. The possibilities are endless: Maybe there is some interesting economic event you want to analyze, such as stock market responses to Twitter activity.

References

Chen, Andrew Y., and Tom Zimmermann. 2022. “Open Source Cross-Sectional Asset Pricing.” Critical Finance Review 11 (2): 207–64. http://dx.doi.org/10.1561/104.00000112.