This file contains brief description of the input/output of the 
Evidence Ranked Motif Indentification algorithm (version 1.1)
Basic instructions on how to run the code are also included. 
Example input files are provided with the code:
- Pumilio2 human PAR-CLIP dataset (Hafner et al. 2010)
- Quaking human PAR-CLIP dataset (Hafner et al. 2010)
- IG2BP1 human PAR-CLIP dataset (Hafner et al. 2010)

Input files
===========
1) Evidence file (evidence_<SUFFIX>)        (e.g. data/evidence_ConversionEventCount_23UTR+5UTR+CDS+intronFilterType_NA)
2) Regulatory sequence files (seq_<SUFFIX>) (e.g. data/seq_ConversionEventCount_23UTR+5UTR+CDS+intronFilterType_NA)
3) Evidence parameters file                 (e.g. evidence_file_list)
4) Regulatory sequence parameters file      (e.g. sequence_file_list)

Output file
===========
1) <SUFFIX>_summary.txt           - ranked list of the motif predictions according to the score assigned by cERMIT
2) predicted_targets              - ids of regions with moitf matches (from regions with binding evidence >= 50-th percentile)
3) predicted_targets_with_offsets - same as 2), but actual offset locations within each sequence region are reported

where 'SUFFIX' is some arbitrary suffix that corresponds attached to the end of the 
'Evidence file' and 'Regulatory sequence files' (e.g. ConversionEventCount_23UTR+5UTR+CDS+intronFilterType_NA)


The file formats are described below:
*************************************

'Evidence file' (binding evidence for the putative regulatory regions)
===============
<regulatory_region_RBP1_ID><TAB><score>
<regulatory_region_RBP2_ID><TAB><score>
...
<regulatory_region_RBPp_ID><TAB><score>

(e.g. 'score' corresponds to evidence of binding and should be provided in descending order)

'Evidence parameters file' (path to the binding evidence file, PSSM can be omitted if none is available)
=========================
evidence_<SUFFIX>

'Regulatory sequence file' (sequences of putative regulatory regions)
==========================
regulatatory_region1$regulatatory_region1$regulatatory_region2$regulatroy_region2$...regulatatory_region_N$regulatatory_region_N$

*NOTE: All regulatory regions must be the same length, shorter sequences should be padded at the end by 'X' symbols. 
       Expected input sequence for this step is provided by the PARalyzer peak-calling algorithm.

'Regulatory sequence parameters file' (path to regulatory sequence file)
=====================================
seq_<SUFFIX>

'<SUFFIX_name>_summary.txt' (main output file)
===========================
This file contains a ranked list of cERMIT's motif predictions sorted in descending order of enrichment score. 
For each predicted cluster cERMIT outputs: 
<enrichment rank><TAB><IUPAC consensus><TAB><enrichment score>(number of targets, % of total number of putative targets, % "explained" putative target clusters)


Running Instructions:
*********************

Before running cERMIT with the provided sample input, please, make sure that the 'data' subdirectory exists and contains the input files:
1. evidence_<SUFFIX>
2. seq_<SUFFIX>

The following files should be present in the same directory as the executable 'cERMIT':
1. cERMIT              - binary executable
2. run_cERMIT_analysis - executable shell script
3. evidence_file_list
4. sequence_file_list
5. oligo_size_[5-10]
6. generate_logo.R

run command: ./run_cERMIT_analysis
