In the previous chapters, we introduced many ways to get financial data that researchers regularly use. We showed how to load data into R from Yahoo!Finance and commonly used file types, such as comma-separated or Excel files. Then, we introduced remotely connecting to WRDS and downloading data from there. However, this is only a subset of the vast amounts of data available these days.
In this short chapter, we aim to provide an overview of common alternative data providers for which direct access via R packages exists. Such a list requires constant adjustments because both data providers and access methods change. However, we want to emphasize two main insights: First, the number of R packages that provide access to (financial) data is large. Too large actually to survey here exhaustively. Instead, we can only cover the tip of the iceberg. Second, R provides the functionalities to access basically any form of files or data available online. Thus, even if a desired data source does not come with a well-established R package, chances are high that data can be retrieved by establishing your own API connection or by scrapping the content.
In our non-exhaustive list below, we restrict ourselves to listing data sources accessed through easy-to-use R packages. For further inspiration on potential data sources, we recommend reading the R task view empirical finance. Further inspiration (on more general social sciences) can be found here.
If you feel that we miss a fantastic financial data source, please get in touch with via email@example.com - thank you very much for your support!
|FED||The Federal Reserve Bank of St. Louis provides more than 818,000 US and international time series from 109 sources via the API FRED. The data is freely available and can be browsed online on the FRED homepage.||
|ECB||The European Central Bank’s Statistical Data Warehouse provides data on Euro area monetary policy, financial stability, and other topics relevant to the activities of the ECB and the European System of Central Banks (ESCB).||
|Bloomberg||Bloomberg’s Fundamental coverage includes current and normalized historical data for the balance sheet, income statement, cash flows statement, and financial ratios. Additionally, it provides industry-specific data for communications, consumer, energy, health care, and many more. In order to retrieve Bloomberg data, a paid subscription is needed.||
|Refinitiv Eikon||Eikon provides access to real-time market data, news, fundamental data, analytics, trading, and messaging tools. Refinitiv’s Eikon is a paid service. Apart from the CRAN version, there is also
|Nasdaq Data Link (Quandl)||Quandl is a publisher of alternative data. Quandl publishes free data, scraped from many different sources from the web. However, some of the data requires specific subscriptions on the Quandl platform.||
|Global factor data||The data repository of Jensen, Kelly, and Pedersen (2022). They provide return data for characteristic-managed portfolios from around the world. The database includes factors for 153 characteristics in 13 themes, using data from 93 countries. Download the data here.|
|Open Source Asset Pricing||The data repository of A. Y. Chen and Zimmermann (2022). They provide return data for over 200 trading strategies with different time periods and specifications. The authors also provide signals and explanations of the factor construction. Download the data here.|
|Simfin||Simfin make fundamental financial data freely available to private investors, researchers, and students. The data provider applies automating data collection processes to collect a large set of publicly available information from firms’ financial statements.||
|IEX||The IEX Group operates the Investors Exchange (IEX), a stock exchange for US equities. IEX offers US reference and market data including end-of-day and intraday pricing data. IEX offers an API which is freely available.||
|TAQ||TAQ data provides subscribed users access to all trades and quotes for all issues traded on NYSE, Nasdaq, and the regional exchanges. TAQ data can be accessed from WRDS via Postgres. The
|Other (free) data|
|CoinMarketCap||The data provider CoinMarketCap provides cryptocurrency information and historical prices, as well as information on the exchanges they are listed on.||
|CoinGecko||CoinGecko is an alternative crypto data provider of current and historical data on a myriad of coins and exchanges.||
|Twitter provides (limited) access for academic research to extract and analyze Tweets.||
|SEC company fillings||The EDGAR database provides free public access to corporate information, allowing you to research a public company’s financial information and operations by reviewing the filings the company makes with the SEC. You can also research information provided by mutual funds (including money market funds), exchange-traded funds (ETFs), and variable annuities.||
|Google trends||Google offers public access to global search volumes through its search engine through the Google Trends portal.||
- Select one of the data sources in the table above and retrieve some data: Browse the homepage of the data provider or the package documentation to find inspiration on which type of data is available to you and how to download the data into your R session.
- Generate summary statistics of the data you retrieved and provide some useful visualization. The possibilities are endless: Maybe there is some interesting economic event you want to analyze, such as stock market responses to Twitter activity.
Simfin provides excellent data coverage. Use their API to find out if the information Simfin provides overlaps with the CRSP/Compustat dataset in the
tidy_finance.sqlitedatabase introduced in Chapters 2-4.