Title: Access 'Wharton Research Data Services' ('WRDS')
Version: 0.0.1
Description: Provides simple functions for accessing data from 'Wharton Research Data Services' ('WRDS'), a widely used financial database in academic research. Includes credential management via the system keyring, database tools, and functions for downloading generic tables, 'Compustat' fundamentals, and linking tables.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3
URL: https://github.com/statzhero/wrds
BugReports: https://github.com/statzhero/wrds/issues
Suggests: testthat (≥ 3.0.0), withr
Config/testthat/edition: 3
Imports: cli, DBI, dbplyr, dplyr, keyring, rlang, RPostgres, tidylog
NeedsCompilation: no
Packaged: 2026-01-14 02:23:14 UTC; rico
Author: Ulrich Atz ORCID iD [aut, cre, cph]
Maintainer: Ulrich Atz <ulrich.atz@unibocconi.it>
Depends: R (≥ 4.1.0)
Repository: CRAN
Date/Publication: 2026-01-19 18:20:02 UTC

wrds: Access 'Wharton Research Data Services' ('WRDS')

Description

Provides simple functions for accessing data from 'Wharton Research Data Services' ('WRDS'), a widely used financial database in academic research. Includes credential management via the system keyring, database tools, and functions for downloading generic tables, 'Compustat' fundamentals, and linking tables.

Author(s)

Maintainer: Ulrich Atz ulrich.atz@unibocconi.it (ORCID) [copyright holder]

See Also

Useful links:


Check connection validity

Description

Internal function to validate a database connection object.

Usage

check_connection(wrds)

Arguments

wrds

Object to check.

Value

Invisibly returns TRUE if valid; aborts otherwise.


Get company table configuration

Description

Internal function returning table names and default columns for company queries.

Usage

company_config(region)

Arguments

region

One of "na" or "global".

Value

A list with table and columns.


Get Compustat configuration

Description

Internal function returning table names and filters for Compustat queries.

Usage

compustat_config(frequency, region)

Arguments

frequency

One of "annual" or "quarterly".

region

One of "na" or "global".

Value

A list with table, datafmt, popsrc, and columns.


Describe a table

Description

Displays a glimpse-like summary of a WRDS table showing column names and types, similar to dplyr::glimpse().

Usage

describe_table(wrds, library, table, n = 20, max_cols = 25)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

library

Character. The name of the library (schema).

table

Character. The name of the table.

n

Integer. Number of sample rows to fetch for value preview. Default is 20.

max_cols

Integer. Maximum number of columns to display. Default is 25.

Value

Invisibly returns a list with components:

columns

A data frame with column_name and data_type

nrow

Row count

sample

A data frame with sample rows (if n > 0)

Examples

## Not run: 
wrds <- wrds_connect()
describe_table(wrds, "comp", "funda")
wrds_disconnect(wrds)

## End(Not run)

Fill missing SIC codes from company header

Description

Internal function that joins fundamentals data with company header to fill missing historical SIC codes using coalesce.

Usage

fill_sic_codes(tbl, wrds)

Arguments

tbl

A lazy table from funda/fundq.

wrds

Database connection.

Value

A collected tibble with sic column (character, coalesced from sich and header sic).


Download Compustat company header data

Description

Downloads company-level static data from Compustat including header SIC codes, NAICS codes, state of incorporation, and other identifying information.

Usage

get_company(
  wrds,
  region = c("na", "global"),
  columns = NULL,
  n = Inf,
  lazy = FALSE
)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

region

One of "na" (North America, default) or "global".

columns

Character vector of columns to return. Defaults to key identifiers and classification codes.

n

Maximum number of rows to return. Defaults to Inf (all rows). Use a smaller value (e.g., n = 100) to preview data before downloading the full table.

lazy

If TRUE, returns a lazy tbl instead of collecting. Defaults to FALSE.

Details

The sic column contains the "header" SIC code, which is the company's most recent SIC classification stored as a character. For historical SIC codes that change over time, use get_compustat() with fill_sic = TRUE, which coalesces the historical sich (integer) with the header sic.

Value

A tibble with company header data. Default columns vary by region:

North America (from comp.company):

Global (from comp.g_company):

See Also

get_compustat() for fundamentals data with optional SIC filling

Examples

## Not run: 
wrds <- wrds_connect()

# Get company header data
company <- get_company(wrds)

# Get global companies
g_company <- get_company(wrds, region = "global")

# Lazy query
get_company(wrds, lazy = TRUE) |>
  dplyr::filter(sic == "7370") |>
  dplyr::collect()

wrds_disconnect(wrds)

## End(Not run)

Download Compustat fundamentals

Description

Downloads financial statement data from Compustat with standard filters for clean, analysis-ready data.

Usage

get_compustat(
  wrds,
  frequency = c("annual", "quarterly"),
  region = c("na", "global"),
  start_date = NULL,
  end_date = NULL,
  columns = NULL,
  add_columns = NULL,
  indfmt = "INDL",
  consol = "C",
  fill_sic = FALSE,
  n = Inf,
  lazy = FALSE
)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

frequency

One of "annual" (default) or "quarterly".

region

One of "na" (North America, default) or "global".

start_date

Start date for filtering. Character string in "YYYY-MM-DD" format or a Date object. Defaults to NULL (no filter).

end_date

End date for filtering. Character string in "YYYY-MM-DD" format or a Date object. Defaults to NULL (no filter).

columns

Character vector of columns to return, replacing the defaults. Use describe_table() to see available columns.

add_columns

Character vector of additional columns to include beyond the defaults. Ignored if columns is specified.

indfmt

Industry format filter. Defaults to "INDL" (industrial). Use "FS" for financial services format.

consol

Consolidation level. Defaults to "C" (consolidated). Use "B" for both consolidated and non-consolidated.

fill_sic

If TRUE, fills missing historical SIC codes (sich) with header SIC codes from comp.company. Only supported for North America. When used with lazy = TRUE, returns the table with sich but without the join (requires manual joining with get_company()). Defaults to FALSE.

n

Maximum number of rows to return. Defaults to Inf (all rows). Use a smaller value (e.g., n = 100) to preview data before downloading the full table.

lazy

If TRUE, returns a lazy tbl instead of collecting. Defaults to FALSE.

Details

Default filters follow standard practice for most research applications. Region-specific filters are applied automatically based on region:

North America and Global data have different structures and should not be combined without careful column harmonization.

Value

A tibble with Compustat fundamentals. Default columns vary by region:

North America (from comp.funda / comp.fundq):

Global (from comp.g_funda / comp.g_fundq):

See Also

link_ccm() for CRSP-Compustat linking, get_company() for company header data

Examples

## Not run: 
wrds <- wrds_connect()

# Annual North America fundamentals
funda <- get_compustat(wrds)

# Quarterly with date filter
fundq <- get_compustat(wrds,
  frequency = "quarterly",
  start_date = "2020-01-01",
  end_date = "2023-12-31"
)

# Global annual
g_funda <- get_compustat(wrds, region = "global")

# Lazy query for further filtering
get_compustat(wrds, lazy = TRUE) |>
  dplyr::filter(fyear >= 2020) |>
  dplyr::select(gvkey, datadate, at, lt) |>
  dplyr::collect()

# Fill missing SIC codes with header SIC from comp.company
funda_sic <- get_compustat(wrds, fill_sic = TRUE)

# Preview first 100 rows before full download
preview <- get_compustat(wrds, n = 100)

wrds_disconnect(wrds)

## End(Not run)

Download data from any WRDS table

Description

Generic function to download data from any table in the WRDS database. Returns a lazy table by default, allowing you to build queries with dplyr before collecting.

Usage

get_table(wrds, library, table, columns = NULL, n = Inf, lazy = TRUE)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

library

Character. The name of the library (schema), e.g., "crsp", "comp", "ibes".

table

Character. The name of the table within the library.

columns

Character vector of columns to return. If NULL (default), returns all columns. Use describe_table() to see available columns.

n

Maximum number of rows to return. Defaults to Inf (all rows). Use a smaller value (e.g., n = 100) to preview data.

lazy

If TRUE (default), returns a lazy tbl for further filtering with dplyr. Set to FALSE to collect immediately.

Details

This function provides generic access to any WRDS table. For commonly-used tables with standard research filters, prefer the specialized functions:

The lazy table can be filtered, selected, and mutated using dplyr verbs, which are translated to SQL and executed on the server:

get_table(wrds, "crsp", "msf") |>
  filter(date >= "2025-01-01") |>
  select(permno, date, ret, prc) |>
  collect()

Value

A tbl_lazy object (if lazy = TRUE) or a tibble (if lazy = FALSE).

See Also

describe_table() to explore table structure, list_tables() to list available tables in a library

Examples

## Not run: 
wrds <- wrds_connect()

# Preview table structure first
describe_table(wrds, "crsp", "msf")

# Get a lazy table and build your query
get_table(wrds, "crsp", "msf") |>
  dplyr::filter(date >= "2025-01-01") |>
  dplyr::select(permno, date, ret, prc, vol) |>
  dplyr::collect()

# Collect immediately with specific columns
get_table(wrds, "crsp", "dsf",
  columns = c("permno", "date", "ret", "prc"),
  lazy = FALSE,
  n = 1000
)

# Access any table in any library
get_table(wrds, "ibes", "statsum_epsus") |>
  dplyr::filter(fpedats >= "2025-01-01") |>
  dplyr::collect()

wrds_disconnect(wrds)

## End(Not run)

Description

Downloads the CCM (CRSP-Compustat Merged) linking table that maps CRSP PERMNOs to Compustat GVKEYs with valid date ranges.

Usage

link_ccm(
  wrds,
  linktype = c("LC", "LU", "LS"),
  linkprim = c("P", "C"),
  n = Inf,
  lazy = FALSE
)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

linktype

Character vector. Types of links to include. Defaults to c("LC", "LU", "LS"):

  • "LC": Link confirmed by Compustat

  • "LU": Link unconfirmed (valid but less certain)

  • "LS": Link valid for secondary securities

linkprim

Character vector. Link primacy filters. Defaults to c("P", "C"):

  • "P": Primary link identified by Compustat

  • "C": Primary link identified by CRSP

n

Maximum number of rows to return. Defaults to Inf (all rows). Use a smaller value (e.g., n = 100) to preview data before downloading the full table.

lazy

If TRUE, returns a lazy tbl instead of collecting. Defaults to FALSE.

Details

The linking table comes from crsp.ccmxpf_lnkhist. Missing linkenddt values indicate ongoing links and are replaced with the maximum date in the table for easier date-range joins.

To use the link, join on gvkey and ensure your observation date falls within the linkdt to linkenddt range.

Value

A tibble with columns:

gvkey

Compustat company identifier

permno

CRSP permanent security identifier

linkdt

Start date of the link

linkenddt

End date of the link (missing values replaced with max date)

linktype

Type of link

linkprim

Link primacy

References

Ian Gow, Financial Accounting Research, Chapter on Identifiers: https://iangow.github.io/far_book/identifiers.html

See Also

get_compustat()

Examples

## Not run: 
wrds <- wrds_connect()
ccm <- link_ccm(wrds)

# Join with Compustat data
compustat <- get_compustat(wrds)
compustat |>
  dplyr::inner_join(ccm, by = dplyr::join_by(gvkey)) |>
  dplyr::filter(datadate >= linkdt, datadate <= linkenddt)

wrds_disconnect(wrds)

## End(Not run)

Description

Downloads the WRDS-provided linking table that maps IBES tickers to CRSP PERMNOs with valid date ranges and match quality scores.

Usage

link_ibes_crsp(wrds, max_score = 5L, n = Inf, lazy = FALSE)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

max_score

Maximum match quality score to include. Defaults to 5, which excludes score 6 (the worst matches). Lower scores indicate

better matches:

  • 1: Best match (CUSIP, ticker, and company name all match)

  • 2-5: Progressively weaker matches

  • 6: Worst match (excluded by default)

n

Maximum number of rows to return. Defaults to Inf (all rows). Use a smaller value (e.g., n = 100) to preview data before downloading the full table.

lazy

If TRUE, returns a lazy tbl instead of collecting. Defaults to FALSE.

Details

The linking table comes from wrdsapps_link_crsp_ibes.ibcrsphist.

To use the link, join on ticker and ensure your observation date falls within the sdate to edate range.

Value

A tibble with columns:

ticker

IBES ticker

permno

CRSP permanent security identifier

sdate

Start date of the link

edate

End date of the link

score

Match quality score (1 = best, 6 = worst)

References

WRDS IBES-CRSP Linking Table Documentation: https://wrds-www.wharton.upenn.edu/documents/796/IBES_CRSP_Linking_Table_by_WRDS.pdf

See Also

link_ccm()

Examples

## Not run: 
wrds <- wrds_connect()
ibes_link <- link_ibes_crsp(wrds)

# Join with IBES data on ticker and date range
ibes_data |>
  dplyr::inner_join(ibes_link, by = dplyr::join_by(ticker)) |>
  dplyr::filter(date >= sdate, date <= edate)

wrds_disconnect(wrds)

## End(Not run)

List available libraries

Description

Returns a character vector of available schema names (libraries) on WRDS.

Usage

list_libraries(wrds)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

Value

A character vector of library names.

Examples

## Not run: 
wrds <- wrds_connect()
list_libraries(wrds)
wrds_disconnect(wrds)

## End(Not run)

List tables in a library

Description

Returns a character vector of table names within a WRDS library (schema).

Usage

list_tables(wrds, library)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

library

Character. The name of the library (schema) to query.

Value

A character vector of table names.

Examples

## Not run: 
wrds <- wrds_connect()
list_tables(wrds, "comp")
wrds_disconnect(wrds)

## End(Not run)

Convert SIC codes to 2-digit industry codes

Description

Extracts the first two characters from SIC codes to create broader industry classifications.

Usage

sic_2digit(sic)

Arguments

sic

A numeric or character vector of SIC codes.

Details

SIC codes are hierarchical: the first two digits represent major industry groups (e.g., "54" = Retail-Food Stores), while the full 4-digit code provides more specific classifications (e.g., "5412" = Retail-Convenience Stores).

Value

A character vector of 2-digit SIC codes.

Examples

# Convenience Stores (SIC 5412) -> Retail-Food Stores (54)
sic_2digit(5412)
# [1] "54"

sic_2digit(c(5412, 5400))
# [1] "54" "54"

Smart collect with size awareness

Description

Internal function that collects a lazy table with warnings for large queries.

Usage

smart_collect(tbl, wrds, lazy = FALSE)

Arguments

tbl

A tbl_lazy object from dbplyr.

wrds

Database connection for row counting.

lazy

If TRUE, return the lazy table without collecting.

Value

A tibble if collecting, or the lazy table if lazy = TRUE.


Connect to WRDS

Description

Establishes a connection to the WRDS PostgreSQL server using credentials stored securely in the system keyring.

Usage

wrds_connect(user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL)

Arguments

user_key

Name of the keyring entry storing the WRDS username. Defaults to "wrds_user".

password_key

Name of the keyring entry storing the WRDS password. Defaults to "wrds_pw".

keyring

Optional keyring name. If NULL (default), uses the default keyring.

Details

Credentials must be set up before first use with wrds_set_credentials(). The connection uses bigint = "integer" for compatibility with R's integer type.

Value

A DBIConnection object for the WRDS PostgreSQL database.

See Also

wrds_disconnect(), wrds_set_credentials()

Examples

## Not run: 
wrds <- wrds_connect()
list_libraries(wrds)
wrds_disconnect(wrds)

## End(Not run)

Disconnect from WRDS

Description

Closes a WRDS database connection.

Usage

wrds_disconnect(wrds)

Arguments

wrds

A DBIConnection object returned by wrds_connect().

Value

Invisibly returns TRUE if disconnection was successful.

Examples

## Not run: 
wrds <- wrds_connect()
wrds_disconnect(wrds)

## End(Not run)

Set WRDS credentials

Description

Interactively stores WRDS username and password in the system keyring for secure, persistent storage.

Usage

wrds_set_credentials(
  user_key = "wrds_user",
  password_key = "wrds_pw",
  keyring = NULL
)

Arguments

user_key

Name for the username keyring entry. Defaults to "wrds_user".

password_key

Name for the password keyring entry. Defaults to "wrds_pw".

keyring

Optional keyring name. If NULL (default), uses the default keyring.

Details

This function prompts for username and password interactively. Credentials are stored securely using the operating system's keyring (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux).

Value

Invisibly returns TRUE on success.

Examples

## Not run: 
wrds_set_credentials()

## End(Not run)