ctdR

Identify chemicals associated with your genes using enrichment analysis and data from the Comparative Toxicogenomics Database.

version R version license CI Codecov lifecycle DOI

What does ctdR do?

Given a list of genes (e.g. from a differential expression analysis), ctdR tells you which chemicals are significantly associated with those genes according to the CTD database. Two complementary methods are available:

Over-Representation Analysis

Tests whether your gene list overlaps significantly with each chemical's known targets. Best when you have a defined gene set with a clear significance cutoff.

Powered by clusterProfiler::enricher

Gene Set Enrichment Analysis

Uses your full ranked gene list to find chemicals whose targets cluster at the extremes. No arbitrary cutoff needed.

Powered by fgsea::fgsea

Getting Started

  1. Install ctdR
    devtools::install_github("drake69/ctdR")
  2. Download CTD data Download CTD_chem_gene_ixns.csv.gz from ctdbase.org and decompress it.
  3. Import the data (once)
    library(ctdR)
    import_CTD("~/Downloads/CTD_chem_gene_ixns.csv")
  4. Run enrichment analysis
    results <- enrichment_CTD(my_genes, method = "ORA")
    head(results)

Examples

Example 1: Identifying chemicals linked to inflammatory genes ORA

You have a set of genes involved in inflammation and want to find which chemicals are known to interact with them.

library(ctdR)

# A set of inflammatory genes (Entrez IDs)
# TNF, IL6, IL1B, PTGS2, CXCL8, CCL2, NFKB1, STAT3, MMP9, ICAM1
genes <- data.frame(
  entrez_ids = c("7124", "3569", "3553", "5743", "3576",
                 "6347", "4790", "6774", "4318", "3383"),
  pvalue     = c(0.001, 0.002, 0.003, 0.005, 0.007,
                 0.01,  0.015, 0.02,  0.03,  0.04)
)

# Find chemicals enriched for these genes
ora_results <- enrichment_CTD(genes, method = "ORA")

# View top hits
head(ora_results[, c("ChemicalName", "padj",
                      "foldEnrichment", "Count")])
#>       ChemicalName         padj foldEnrichment Count
#> 1  Lipopolysaccharides  1.2e-15          8.3      9
#> 2  Dexamethasone        3.4e-12          6.1      8
#> 3  Acetaminophen        7.8e-10          5.4      7
#> ...

Example 2: Ranked analysis of a full gene expression profile GSEA

You have differential expression results and want to use the full ranking (not just significant genes) to discover chemical associations.

library(ctdR)

# Load your DEG results (Entrez IDs + p-values)
deg_results <- read.csv("my_deg_results.csv")

# Prepare input: must have 'entrez_ids' column + numeric column
genes <- data.frame(
  entrez_ids = deg_results$entrez_id,
  pvalue     = deg_results$pvalue
)

# Run GSEA — uses full ranking, no cutoff needed
gsea_results <- enrichment_CTD(genes, method = "GSEA")

# View top enriched chemicals
head(gsea_results[, c("ChemicalName", "NES",
                       "padj", "size")])
#>        ChemicalName   NES      padj size
#> 1  Benzo(a)pyrene    2.41   0.001   312
#> 2  Valproic Acid     2.18   0.003   287
#> 3  Estradiol        -1.95   0.005   445
#> ...

Data Licensing Disclaimer

This package does not bundle or redistribute any data from the Comparative Toxicogenomics Database. CTD data are maintained by NC State University. Users must download the data directly from ctdbase.org and comply with the CTD Terms of Service.

Please cite CTD in publications: Davis AP et al. (2023) Nucleic Acids Research, 51(D1), D1257-D1262. doi:10.1093/nar/gkac833