library(tidyverse)
library(tidyfinance)
library(dbplyr)
library(RSQLite)
library(RPostgres)
TRACE and FISD
You are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.
In this chapter, we dive into the US corporate bond market. Bond markets are far more diverse than stock markets, as most issuers have multiple bonds outstanding simultaneously with potentially very different indentures. This market segment is exciting due to its size (roughly 10 trillion USD outstanding), heterogeneity of issuers (as opposed to government bonds), market structure (mostly over-the-counter trades), and data availability. We introduce how to use bond characteristics from FISD and trade reports from TRACE and provide code to download and clean TRACE in R.
Many researchers study liquidity in the US corporate bond market O’Hara and Zhou (2021). We do not cover bond returns here, but you can compute them from TRACE data. Instead, we refer to studies on the topic such as Bessembinder et al. (2008), Bai, Bali, and Wen (2019), and Kelly, Palhares, and Pruitt (2021) and a survey by Huang and Shi (2021). Moreover, WRDS includes bond returns computed from TRACE data at a monthly frequency.
The current chapter relies on this set of R packages.
Compared to previous chapters, we load the devtools
package (Wickham et al. 2022) to source code that we provided to the public via gist.
Bond Data from WRDS
Both bond databases we need are available on WRDS to which we establish the RPostgres
connection described in WRDS, CRSP, and Compustat. Additionally, we connect to our local SQLite
-database to store the data we download.
<- dbConnect(
wrds Postgres(),
host = "wrds-pgdata.wharton.upenn.edu",
dbname = "wrds",
port = 9737,
sslmode = "require",
user = Sys.getenv("WRDS_USER"),
password = Sys.getenv("WRDS_PASSWORD")
)
<- dbConnect(
tidy_finance SQLite(),
"data/tidy_finance_r.sqlite",
extended_types = TRUE
)
Mergent FISD
For research on US corporate bonds, the Mergent Fixed Income Securities Database (FISD) is the primary resource for bond characteristics. There is a detailed manual on WRDS, so we only cover the necessary subjects here. FISD data comes in two main variants, namely, centered on issuers or issues. In either case, the most useful identifiers are CUSIPs. 9-digit CUSIPs identify securities issued by issuers. The issuers can be identified from the first six digits of a security CUSIP, which is also called 6-digit CUSIP. Both stocks and bonds have CUSIPs. This connection would, in principle, allow matching them easily, but due to changing issuer details, this approach only yields small coverage.
We use the issue-centered version of FISD to identify the subset of US corporate bonds that meet the standard criteria (Bessembinder, Maxwell, and Venkataraman 2006). The WRDS table fisd_mergedissue
contains most of the information we need on a 9-digit CUSIP level. Due to the diversity of corporate bonds, details in the indenture vary significantly. We focus on common bonds that make up the majority of trading volume in this market without diverging too much in indentures.
The following chunk connects to the data and selects the bond sample to remove certain bond types that are less commonly used (see, e.g., Dick-Nielsen, Feldhütter, and Lando 2012; O’Hara and Zhou 2021, among many others). In particular, we use the filters listed below. Note that we also treat missing values in these flags.
- Keep only senior bonds (
security_level = 'SEN'
). - Exclude bonds which are secured lease obligations (
slob = 'N' OR slob IS NULL
). - Exclude secured bonds (
security_pledge IS NULL
). - Exclude asset-backed bonds (
asset_backed = 'N' OR asset_backed IS NULL
). - Exclude defeased bonds (
(defeased = 'N' OR defeased IS NULL) AND defeased_date IS NULL
). - Keep only the bond types US Corporate Debentures (
'CDEB'
), US Corporate Medium Term Notes ('CMTN'
), US Corporate Zero Coupon Notes and Bonds ('CMTZ'
,'CZ'
), and US Corporate Bank Note ('USBN'
). - Exclude bonds that are payable in kind (
(pay_in_kind != 'Y' OR pay_in_kind IS NULL) AND pay_in_kind_exp_date IS NULL
). - Exclude foreign (
yankee == "N" OR is.na(yankee)
) and Canadian issuers (canadian = 'N' OR canadian IS NULL
). - Exclude bonds denominated in foreign currency (
foreign_currency = 'N'
). - Keep only fixed (
F
) and zero (Z
) coupon bonds with additional requirements offix_frequency IS NULL
,coupon_change_indicator = 'N'
and annual, semi-annual, quarterly, or monthly interest frequencies. - Exclude bonds that were issued under SEC Rule 144A (
rule_144a = 'N'
). - Exlcude privately placed bonds (
private_placement = 'N' OR private_placement IS NULL
). - Exclude defaulted bonds (
defaulted = 'N' AND filing_date IS NULL AND settlement IS NULL
). - Exclude convertible (
convertible = 'N'
), putable (putable = 'N' OR putable IS NULL
), exchangeable (exchangeable = 'N' OR exchangeable IS NULL
), perpetual (perpetual = 'N'
), or preferred bonds (preferred_security = 'N' OR preferred_security IS NULL
). - Exclude unit deal bonds (
(unit_deal = 'N' OR unit_deal IS NULL)
).
<- tbl(wrds, I("fisd.fisd_mergedissue"))
fisd_mergedissue_db
<- fisd_mergedissue_db |>
fisd filter(
== "SEN",
security_level == "N" | is.na(slob),
slob is.na(security_pledge),
== "N" | is.na(asset_backed),
asset_backed == "N" | is.na(defeased),
defeased is.na(defeased_date),
%in% c(
bond_type "CDEB",
"CMTN",
"CMTZ",
"CZ",
"USBN"
), != "Y" | is.na(pay_in_kind),
pay_in_kind is.na(pay_in_kind_exp_date),
== "N" | is.na(yankee),
yankee == "N" | is.na(canadian),
canadian == "N",
foreign_currency %in% c(
coupon_type "F",
"Z"
), is.na(fix_frequency),
== "N",
coupon_change_indicator %in% c(
interest_frequency "0",
"1",
"2",
"4",
"12"
),== "N",
rule_144a == "N" | is.na(private_placement),
private_placement == "N",
defaulted is.na(filing_date),
is.na(settlement),
== "N",
convertible is.na(exchange),
== "N" | is.na(putable),
putable == "N" | is.na(unit_deal),
unit_deal == "N" | is.na(exchangeable),
exchangeable == "N",
perpetual == "N" | is.na(preferred_security)
preferred_security |>
) select(
complete_cusip, maturity,
offering_amt, offering_date,
dated_date,
interest_frequency, coupon,
last_interest_date,
issue_id, issuer_id|>
) collect()
We also pull issuer information from fisd_mergedissuer
regarding the industry and country of the firm that issued a particular bond. Then, we filter to include only US-domiciled firms’ bonds. We match the data by issuer_id
.
<- tbl(wrds, I("fisd.fisd_mergedissuer"))
fisd_mergedissuer_db
<- fisd_mergedissuer_db |>
fisd_issuer select(issuer_id, sic_code, country_domicile) |>
collect()
<- fisd |>
fisd inner_join(fisd_issuer, join_by(issuer_id)) |>
filter(country_domicile == "USA") |>
select(-country_domicile)
To download the FISD data with the above filters and processing steps, you can use the tidyfinance
package. Note that you might have to set the login credentials for WRDS first using set_wrds_credentials()
.
<- download_data("wrds_fisd") fisd
Finally, we save the bond characteristics to our local database. This selection of bonds also constitutes the sample for which we will collect trade reports from TRACE below.
dbWriteTable(
conn = tidy_finance,
name = "fisd",
value = fisd,
overwrite = TRUE
)
The FISD database also contains other data. The issue-based file contains information on covenants, i.e., restrictions included in bond indentures to limit specific actions by firms (e.g., Handler, Jankowitsch, and Weiss 2021). Moreover, FISD also provides information on bond ratings. We do not need either here.
TRACE
The Financial Industry Regulatory Authority (FINRA) provides the Trade Reporting and Compliance Engine (TRACE). In TRACE, dealers that trade corporate bonds must report such trades individually. Hence, we observe trade messages in TRACE that contain information on the bond traded, the trade time, price, and volume. TRACE comes in two variants: standard and enhanced TRACE. We show how to download and clean enhanced TRACE as it contains uncapped volume, a crucial quantity missing in the standard distribution. Moreover, enhanced TRACE also provides information on the respective parties’ roles and the direction of the trade report. These items become essential in cleaning the messages.
Why do we repeatedly talk about cleaning TRACE? Trade messages are submitted within a short time window after a trade is executed (less than 15 minutes). These messages can contain errors, and the reporters subsequently correct them or they cancel a trade altogether. The cleaning needs are described by Dick-Nielsen (2009) in detail, and Dick-Nielsen (2014) shows how to clean the enhanced TRACE data using SAS. We do not go into the cleaning steps here, since the code is lengthy and serves no educational purpose. However, downloading and cleaning enhanced TRACE data is straightforward with our setup.
We use the download_data()
function from the tidyfinance
package to download and clean enhanced TRACE with R. The appendix contains the code that is executed in the backend of the package for reference. The download_dta()
function accepts a vector of CUSIPs, automaticall establishes a connection to WRDS explained in WRDS, CRSP, and Compustat, and a start and end date, respectively.
The TRACE database is considerably large. Therefore, we only download subsets of data at once. Specifying too many CUSIPs over a long time horizon will result in very long download times and a potential failure due to the size of the request to WRDS. The size limit depends on many parameters, and we cannot give you a guideline here. If we were working with the complete TRACE data for all CUSIPs above, splitting the data into 100 parts takes roughly two hours using our setup. For the applications in this book, we need data around the Paris Agreement in December 2015 and download the data in ten sets, which we define below.
<- fisd |>
fisd_cusips pull(complete_cusip)
<- split(
fisd_parts
fisd_cusips,rep(1:10,
length.out = length(fisd_cusips))
)
Finally, we run a loop in the same style as in WRDS, CRSP, and Compustat where we download daily returns from CRSP. For each of the CUSIP sets defined above, we call the cleaning function and save the resulting output. We add new data to the existing table for batch two and all following batches.
<- length(fisd_parts)
batches
for (j in 1:batches) {
<- download_data(
trace_enhanced type = "wrds_trace_enhanced",
cusips = fisd_parts[[j]],
start_date = ymd("2014-01-01"),
end_date = ymd("2016-11-30")
)
dbWriteTable(
conn = tidy_finance,
name = "trace_enhanced",
value = trace_enhanced,
overwrite = ifelse(j == 1, TRUE, FALSE),
append = ifelse(j != 1, TRUE, FALSE)
)
message("Batch ", j, " out of ", batches, " done (",
round(j / batches, 2) * 100, "%)\n")
}
Insights into Corporate Bonds
While many news outlets readily provide information on stocks and the underlying firms, corporate bonds are not covered frequently. Additionally, the TRACE database contains trade-level information, potentially new to students. Therefore, we provide you with some insights by showing some summary statistics.
We start by looking into the number of bonds outstanding over time and compare it to the number of bonds traded in our sample. First, we compute the number of bonds outstanding for each quarter around the Paris Agreement from 2014 to 2016.
<- expand_grid(
bonds_outstanding "date" = seq(ymd("2014-01-01"), ymd("2016-11-30"), by = "quarter"),
"complete_cusip" = fisd$complete_cusip
|>
) left_join(fisd |>
select(complete_cusip, offering_date, maturity),
join_by(complete_cusip)) |>
mutate(offering_date = floor_date(offering_date),
maturity = floor_date(maturity)) |>
filter(date >= offering_date & date <= maturity) |>
count(date) |>
mutate(type = "Outstanding")
Next, we look at the bonds traded each quarter in the same period. Notice that we load the complete trace table from our database, as we only have a single part of it in the environment from the download loop above.
<- tbl(tidy_finance, "trace_enhanced") |>
trace_enhanced collect()
<- trace_enhanced |>
bonds_traded mutate(date = floor_date(trd_exctn_dt, "quarters")) |>
group_by(date) |>
summarize(n = length(unique(cusip_id)),
type = "Traded",
.groups = "drop")
Finally, we plot the two time series in Figure 1.
|>
bonds_outstanding bind_rows(bonds_traded) |>
ggplot(aes(
x = date,
y = n,
color = type,
linetype = type
+
)) geom_line() +
labs(
x = NULL, y = NULL, color = NULL, linetype = NULL,
title = "Number of bonds outstanding and traded each quarter"
)
We see that the number of bonds outstanding increases steadily between 2014 and 2016. During our sample period of trade data, we see that the fraction of bonds trading each quarter is roughly 60 percent. The relatively small number of traded bonds means that many bonds do not trade through an entire quarter. This lack of trading activity illustrates the generally low level of liquidity in the corporate bond market, where it can be hard to trade specific bonds. Does this lack of liquidity mean that corporate bond markets are irrelevant in terms of their size? With over 7,500 traded bonds each quarter, it is hard to say that the market is small. However, let us also investigate the characteristics of issued corporate bonds. In particular, we consider maturity (in years), coupon, and offering amount (in million USD).
|>
fisd mutate(maturity = as.numeric(maturity - offering_date) / 365,
offering_amt = offering_amt / 10^3) |>
pivot_longer(cols = c(maturity, coupon, offering_amt),
names_to = "measure") |>
drop_na() |>
group_by(measure) |>
summarize(
mean = mean(value),
sd = sd(value),
min = min(value),
q05 = quantile(value, 0.05),
q50 = quantile(value, 0.50),
q95 = quantile(value, 0.95),
max = max(value)
)
# A tibble: 3 × 8
measure mean sd min q05 q50 q95 max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 coupon 5.96 2.66 0 1.95 6 10 39
2 maturity 9.57 9.11 -6.24 1.03 7.05 30.0 101.
3 offering_amt 372. 564. 0.001 0.632 200 1324. 15000
We see that the average bond in our sample period has an offering amount of over 357 million USD with a median of 200 million USD, which both cannot be considered small. The average bond has a maturity of 10 years and pays around 6 percent in coupons.
Finally, let us compute some summary statistics for the trades in this market. To this end, we show a summary based on aggregate information daily. In particular, we consider the trade size (in million USD) and the number of trades.
|>
trace_enhanced group_by(trd_exctn_dt) |>
summarize(trade_size = sum(entrd_vol_qt * rptd_pr / 100) / 10^6,
trade_number = n(),
.groups = "drop") |>
pivot_longer(cols = c(trade_size, trade_number),
names_to = "measure") |>
group_by(measure) |>
summarize(
mean = mean(value),
sd = sd(value),
min = min(value),
q05 = quantile(value, 0.05),
q50 = quantile(value, 0.50),
q95 = quantile(value, 0.95),
max = max(value)
)
# A tibble: 2 × 8
measure mean sd min q05 q50 q95 max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 trade_number 25914. 5444. 439 17844. 26051 34383. 40839
2 trade_size 12968. 3577. 17.2 6128. 13421. 17850. 21312.
On average, nearly 26 billion USD of corporate bonds are traded daily in nearly 13,000 transactions. We can hence conclude that the corporate bond market is indeed significant in terms of trading volume and activity.
Exercises
- Compute the amount outstanding across all bonds over time. Make sure to subtract all matured bonds. How would you describe the resulting plot?
- Compute the number of days each bond is traded (accounting for the bonds’ maturities and issuances). Start by looking at the number of bonds traded each day in a graph similar to the one above. How many bonds trade on more than 75 percent of trading days?
- WRDS provides more information from Mergent FISD such as ratings in the table
fisd_ratings
. Download the ratings table and plot the distribution of ratings for the different rating providers. How would you map the different providers to a common numeric rating scale?