Read HOMER Motif Files

The core function to read in motif files, whether from the HOMER database, from HOMER denovo motif enrichment results, or even custom motifs. In all cases, these files must be in the HOMER-format. See below for more details.

read_motif(path)

Arguments

path	location of motif file

Value

at minimum, a tibble with the following columns:

consensus the consensus sequence of the denovo motif
motif_name name of the motif
log_odds_detection threshold used to determine bound vs. unbound sites
motif_pwm a list column with PWMs for each motif

The following columns are presented when available from complete *.motif* files or from HOMER results directories:

log_p_value_detection from the original experiment used to ID motif
tgt_num number of times motif appears in target sequences
tgt_pct percent of times motif appears in target sequences
bgd_num number of times motif appears in background sequences
bgd_pct percent of times motif appears in background sequences
log_p_value final enrichment from experiment -log10(p-value)
tgt_pos average position of motif in target sequences, where 0 = start of sequences
tgt_std standard deviation of position in target sequences
bgd_pos average position of motif in background sequences, where 0 = start of sequences
bgd_std standard deviation of position in background sequences
strand_bias log ratio of + strand occurrences to - strand occurrences
multiplicity average number of occurrences per sequence in sequences with 1 or more binding sites

Details

To read-in a HOMER-formatted motif, at a minimum, the first three fields are required to properly ID the motif:

">" + Consensus sequence The dominant or likeliest sequence
Motif name Should be unique
Log odds detection threshold determines bound vs. unbound sites

The remaining extra fields of HOMER-formatted motifs are described at the URL below, and primarily meant for interpreting motifs from HOMER's own database. To read more about the HOMER format, see: http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html

Note that HOMER also has additional information in the motif name regarding its origin and identity. See the internal function .parse_homer_subfields for more info and to break this field up.

Subsequent lines (after the ">") describe the position weight matrix (PWM), with columns in order of A, C, G, T describing the probabilities of per position of each nucleotide.

Note that it is possible to combine complete information (HOMER-formatted) motifs with minimal motifs. Simply use dplyr::bind_rows for easy concatenation despite column spec differences.

Arguments

Value

Details

Contents