The core function to read in motif files, whether from the HOMER database, from HOMER denovo motif enrichment results, or even custom motifs. In all cases, these files must be in the HOMER-format. See below for more details.

read_motif(path)

Arguments

path

location of motif file

Value

at minimum, a tibble with the following columns:

  • consensus the consensus sequence of the denovo motif

  • motif_name name of the motif

  • log_odds_detection threshold used to determine bound vs. unbound sites

  • motif_pwm a list column with PWMs for each motif

The following columns are presented when available from complete *.motif* files or from HOMER results directories:

  • log_p_value_detection from the original experiment used to ID motif

  • tgt_num number of times motif appears in target sequences

  • tgt_pct percent of times motif appears in target sequences

  • bgd_num number of times motif appears in background sequences

  • bgd_pct percent of times motif appears in background sequences

  • log_p_value final enrichment from experiment -log10(p-value)

  • tgt_pos average position of motif in target sequences, where 0 = start of sequences

  • tgt_std standard deviation of position in target sequences

  • bgd_pos average position of motif in background sequences, where 0 = start of sequences

  • bgd_std standard deviation of position in background sequences

  • strand_bias log ratio of + strand occurrences to - strand occurrences

  • multiplicity average number of occurrences per sequence in sequences with 1 or more binding sites

Details

To read-in a HOMER-formatted motif, at a minimum, the first three fields are required to properly ID the motif:

  • ">" + Consensus sequence The dominant or likeliest sequence

  • Motif name Should be unique

  • Log odds detection threshold determines bound vs. unbound sites

The remaining extra fields of HOMER-formatted motifs are described at the URL below, and primarily meant for interpreting motifs from HOMER's own database. To read more about the HOMER format, see: http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html

Note that HOMER also has additional information in the motif name regarding its origin and identity. See the internal function .parse_homer_subfields for more info and to break this field up.

Subsequent lines (after the ">") describe the position weight matrix (PWM), with columns in order of A, C, G, T describing the probabilities of per position of each nucleotide.

Note that it is possible to combine complete information (HOMER-formatted) motifs with minimal motifs. Simply use dplyr::bind_rows for easy concatenation despite column spec differences.