494 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.3utr.bed
124 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.5utr.bed
629 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.cds.bed
2807 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.ncRNA

#@
Dear Yiota, dear George,

chronia polla!

The replicate-union followed by upper quartile normalization of the ConversionEventCount to calculate the CLI is ready at:

http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_union/

The format is:
GeneName
CDS_start
CDS_end
EnsemblID
ClusterStart = beginning coordinate on the chromosome of the cluster 
ClusterEnd = ending coordinate on the chromosome of the cluster
*ClusterID = unique ID for the cluster
*ClusterSequence = sequence of the cluster
*ReadCount = number of reads that overlap the cluster by at least 1 nucleotide
*ModeLocation = coordinate of the location with the highest signal / (signal + background) value
*ConversionLocationCount = number of unique location where at least 1 conversion occurred
*ConversionEventCount = total number of conversions that occurred within the cluster
*NonConversionEventCount = total number of possible conversion events that did not occur
*ModeScore = score of the highest signal / (signal + background) value
*AvgConversionPct = average conversion % of all conversions in the group containing the cluster
*GroupConversionEventCount = number of all conversions in the group containing the cluster
*SDConversionPct = sdev of conversion % of all conversions in the group containing the cluster
*MaxConversionPct = max. conversion % of all conversions in the group containing the cluster
ModeScore = score of the highest signal / (signal + background) value
Strand = orientation in which the cluster resides
CLI = upper quartile normalized crosslinking index

(Fields marked with "*" are merged with "_")

George received the sampled results last week at:
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1c/res/sampled

Best wishes,
Martin


Dear Yiota,


EnsemblID = transcript ID
ClusterID = unique ID for the cluster
ReadCount = number of reads that overlap the cluster by at least 1 nucleotide
ModeLocation = coordinate of the location with the highest signal / (signal + background) value
ConversionLocationCount = number of unique location where at least 1 conversion occurred
ConversionEventCount = total number of conversions that occurred within the cluster
NonConversionEventCount = total number of possible conversion events that did not occur
ModeScore = score of the highest signal / (signal + background) value
AvgConversionPct = average conversion % of all conversions in the group containing the cluster
GroupConversionLocationCount = number of unique location where at least 1 conversion occurred in the group containing the cluster
MaxConversionPct = max. conversion % of all conversions in the group containing the cluster


track type=bigBed name="0h clusters" description="0h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/0h-mapped2.bed.bb
track type=bigBed name="2h clusters" description="2h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/2h-mapped2.bed.bb
track type=bigBed name="6h clusters" description="6h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/6h-mapped2.bed.bb
track type=bigBed name="IFN clusters" description="IFN paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IFN-mapped2.bed.bb
track type=bigBed name="IL4 clusters" description="IL4 paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IL4-mapped2.bed.bb


Dear Dimitris,

genome mapping of the clusters has finished. You can view the clusters either by
"add custom tracks" and pasting
track type=bigBed name="0h clusters" description="0h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/0h-mapped2.bed.bb
track type=bigBed name="2h clusters" description="2h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/2h-mapped2.bed.bb
track type=bigBed name="6h clusters" description="6h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/6h-mapped2.bed.bb
track type=bigBed name="IFN clusters" description="IFN paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IFN-mapped2.bed.bb
track type=bigBed name="IL4 clusters" description="IL4 paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IL4-mapped2.bed.bb
into "Paste URLs or data:"
or using the hub:
http://genomics-lab.fleming.gr/cgi-bin/hgTracks?db=mm9&hubUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/hub.txt

If you click on a cluster, all Paralyzer information is shown:
EnsemblID = transcript ID
ClusterID = unique ID for the cluster
ReadCount = number of reads that overlap the cluster by at least 1 nucleotide
ModeLocation = coordinate of the location with the highest signal / (signal + background) value
ConversionLocationCount = number of unique location where at least 1 conversion occurred
ConversionEventCount = total number of conversions that occurred within the cluster
NonConversionEventCount = total number of possible conversion events that did not occur
ModeScore = score of the highest signal / (signal + background) value
AvgConversionPct = average conversion % of all conversions in the group containing the cluster
GroupConversionEventCount = number of all conversions in the group containing the cluster
MaxConversionPct = max. conversion % of all conversions in the group containing the cluster

Note that the 0,2,6h tracks require that 2 of 3 replicates cover the cluster, while for IFN and IL4
all clusters identified with the 'recommended' settings are shown. 
For clusters overlapping between replicates, the information for the cluster with the best
AvgConversionPct is shown.
All regions shown are after removal
of overlapping IGG regions.

BW
Martin


GeneName
CDS_start
CDS_end
EnsemblID
ClusterStart = beginning coordinate on the chromosome of the cluster 
ClusterEnd = ending coordinate on the chromosome of the cluster
*ClusterID = unique ID for the cluster
*ClusterSequence = sequence of the cluster
*ReadCount = number of reads that overlap the cluster by at least 1 nucleotide
*ModeLocation = coordinate of the location with the highest signal / (signal + background) value
*ConversionLocationCount = number of unique location where at least 1 conversion occurred
*ConversionEventCount = total number of conversions that occurred within the cluster
*NonConversionEventCount = total number of possible conversion events that did not occur
*ModeScore = score of the highest signal / (signal + background) value
*AvgConversionPct = average conversion % of all conversions in the group containing the cluster
*GroupConversionEventCount = number of all conversions in the group containing the cluster
MaxConversionPct = max. conversion % of all conversions in the group containing the cluster
Strand = orientation in which the cluster resides

Fields marked with * are merged with _


GeneName
CDS_start
CDS_end
EnsemblID
ClusterStart = beginning coordinate on the chromosome of the cluster 
ClusterEnd = ending coordinate on the chromosome of the cluster
*ClusterID = unique ID for the cluster
*ClusterSequence = sequence of the cluster
*ReadCount = number of reads that overlap the cluster by at least 1 nucleotide
*ModeLocation = coordinate of the location with the highest signal / (signal + background) value
*ConversionLocationCount = number of unique location where at least 1 conversion occurred
*ConversionEventCount = total number of conversions that occurred within the cluster
*NonConversionEventCount = total number of possible conversion events that did not occur
ModeScore = score of the highest signal / (signal + background) value
Strand = orientation in which the cluster resides

ConversionEventCount/(ConversionEventCount+NonConversionEventCount)

Fields marked with * are merged with _

Example
0610007P08Rik	156	4767	ENSMUST00000039944	5285	5311	G12910.1_TGGAATTATGTTCTATTACTATTAAAT_28_5312_4_15_278	0.943376255282776	+

GeneName = 0610007P08Rik
CDS_start = 156
CDS_end = 4767
EnsemblID = ENSMUST00000039944
ClusterStart = 5285 beginning coordinate on the chromosome of the cluster 
ClusterEnd = 5311 ending coordinate on the chromosome of the cluster
*ClusterID = G12910.1
*ClusterSequence = TGGAATTATGTTCTATTACTATTAAAT
*ReadCount = 28
*ModeLocation = 5312
*ConversionLocationCount = 4
*ConversionEventCount = 15
*NonConversionEventCount = 278
ModeScore = 0.943376255282776
Strand = +


#@
Dear Yiota

at
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_gt5tc_gt0.25TtoC
are the results as discussed.
The requirement in a group is that the max. TtoC conversion pct should be >= 0.25.
Using the average over the whole group is too strict. This means there should be at least one stron T>C.

BW,
Martin


Dear Yiota

at
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_gt5tc
are the results as discussed.
In the *distributions* files, I include only the lines with ConversionPercent as each file is already ~500MB (the complete info if >4GB per sample).
The locations with -1 are non-T.

The constant 12 for the NonConversionEventCount was my fault, the affected 3 previous files
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IFN.txt2.csv.bed.noIGG.plus.anno.csv
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IL4.txt2.csv.bed.noIGG.plus.anno.csv
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IGG.txt2.csv.bed.plus.anno.csv
have been corrected.

BW
Martin


ENSMUST00000040459 gene=Mapkap1 CDS=373-1926
ENSMUST00000113123 gene=Mapkap1 CDS=522-1512
ENSMUST00000113124 gene=Mapkap1 CDS=282-1740
ENSMUST00000113126 gene=Mapkap1 CDS=48-1614
ENSMUST00000113129 gene=Mapkap1 CDS=447-1905
ENSMUST00000124443 gene=Mapkap1 CDS=711-1630
ENSMUST00000147337 gene=Mapkap1 CDS=142-1708

Dear Dimitris,

please see the attached IGV sessions for all
isoforms of Mapkap1.

The coverage is made of reads that contain at least one
TtoC conversion.

Location and Percentage of TtoC conversions are visisible
in locations that have a blue bar at the bottom (amount of C)
and a red bar at the top (amount of T).

Note the various scales indicated at he top left of each track.

Best wishes,
Martin


PS: In case you'd like to view this in IGV, in the folder at
http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/shrimp/mapkap1
the files are:

0hrep1-15mMm-mapkap1.sam.bam.tc.bam
0hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai
0hrep2-15mMm-mapkap1.sam.bam.tc.bam
0hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai
0hrep3-15mMm-mapkap1.sam.bam.tc.bam
0hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai
2hrep1-15mMm-mapkap1.sam.bam.tc.bam
2hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai
2hrep2-15mMm-mapkap1.sam.bam.tc.bam
2hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai
2hrep3-15mMm-mapkap1.sam.bam.tc.bam
2hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai
6hrep1-15mMm-mapkap1.sam.bam.tc.bam
6hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai
6hrep2-15mMm-mapkap1.sam.bam.tc.bam
6hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai
6hrep3-15mMm-mapkap1.sam.bam.tc.bam
6hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai
IFN-15mMm-mapkap1.sam.bam.tc.bam
IFN-15mMm-mapkap1.sam.bam.tc.bam.bai
IGG-15mMm-mapkap1.sam.bam.tc.bam
IGG-15mMm-mapkap1.sam.bam.tc.bam.bai
IL4-15mMm-mapkap1.sam.bam.tc.bam
IL4-15mMm-mapkap1.sam.bam.tc.bam.bai