494 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.3utr.bed 124 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.5utr.bed 629 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.cds.bed 2807 sh-clusters-IL4.txt2-1kIntrons2.csv3.bed.gt5tc.gt0.25TtoC2.plus.noIGG.f.flt.anno1.bed.genomic2.bed.anno2.exon.ncRNA #@ Dear Yiota, dear George, chronia polla! The replicate-union followed by upper quartile normalization of the ConversionEventCount to calculate the CLI is ready at: http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_union/ The format is: GeneName CDS_start CDS_end EnsemblID ClusterStart = beginning coordinate on the chromosome of the cluster ClusterEnd = ending coordinate on the chromosome of the cluster *ClusterID = unique ID for the cluster *ClusterSequence = sequence of the cluster *ReadCount = number of reads that overlap the cluster by at least 1 nucleotide *ModeLocation = coordinate of the location with the highest signal / (signal + background) value *ConversionLocationCount = number of unique location where at least 1 conversion occurred *ConversionEventCount = total number of conversions that occurred within the cluster *NonConversionEventCount = total number of possible conversion events that did not occur *ModeScore = score of the highest signal / (signal + background) value *AvgConversionPct = average conversion % of all conversions in the group containing the cluster *GroupConversionEventCount = number of all conversions in the group containing the cluster *SDConversionPct = sdev of conversion % of all conversions in the group containing the cluster *MaxConversionPct = max. conversion % of all conversions in the group containing the cluster ModeScore = score of the highest signal / (signal + background) value Strand = orientation in which the cluster resides CLI = upper quartile normalized crosslinking index (Fields marked with "*" are merged with "_") George received the sampled results last week at: http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1c/res/sampled Best wishes, Martin Dear Yiota, EnsemblID = transcript ID ClusterID = unique ID for the cluster ReadCount = number of reads that overlap the cluster by at least 1 nucleotide ModeLocation = coordinate of the location with the highest signal / (signal + background) value ConversionLocationCount = number of unique location where at least 1 conversion occurred ConversionEventCount = total number of conversions that occurred within the cluster NonConversionEventCount = total number of possible conversion events that did not occur ModeScore = score of the highest signal / (signal + background) value AvgConversionPct = average conversion % of all conversions in the group containing the cluster GroupConversionLocationCount = number of unique location where at least 1 conversion occurred in the group containing the cluster MaxConversionPct = max. conversion % of all conversions in the group containing the cluster track type=bigBed name="0h clusters" description="0h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/0h-mapped2.bed.bb track type=bigBed name="2h clusters" description="2h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/2h-mapped2.bed.bb track type=bigBed name="6h clusters" description="6h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/6h-mapped2.bed.bb track type=bigBed name="IFN clusters" description="IFN paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IFN-mapped2.bed.bb track type=bigBed name="IL4 clusters" description="IL4 paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IL4-mapped2.bed.bb Dear Dimitris, genome mapping of the clusters has finished. You can view the clusters either by "add custom tracks" and pasting track type=bigBed name="0h clusters" description="0h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/0h-mapped2.bed.bb track type=bigBed name="2h clusters" description="2h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/2h-mapped2.bed.bb track type=bigBed name="6h clusters" description="6h paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/6h-mapped2.bed.bb track type=bigBed name="IFN clusters" description="IFN paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IFN-mapped2.bed.bb track type=bigBed name="IL4 clusters" description="IL4 paralyzer" bigDataUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/IL4-mapped2.bed.bb into "Paste URLs or data:" or using the hub: http://genomics-lab.fleming.gr/cgi-bin/hgTracks?db=mm9&hubUrl=http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/tracks/hub.txt If you click on a cluster, all Paralyzer information is shown: EnsemblID = transcript ID ClusterID = unique ID for the cluster ReadCount = number of reads that overlap the cluster by at least 1 nucleotide ModeLocation = coordinate of the location with the highest signal / (signal + background) value ConversionLocationCount = number of unique location where at least 1 conversion occurred ConversionEventCount = total number of conversions that occurred within the cluster NonConversionEventCount = total number of possible conversion events that did not occur ModeScore = score of the highest signal / (signal + background) value AvgConversionPct = average conversion % of all conversions in the group containing the cluster GroupConversionEventCount = number of all conversions in the group containing the cluster MaxConversionPct = max. conversion % of all conversions in the group containing the cluster Note that the 0,2,6h tracks require that 2 of 3 replicates cover the cluster, while for IFN and IL4 all clusters identified with the 'recommended' settings are shown. For clusters overlapping between replicates, the information for the cluster with the best AvgConversionPct is shown. All regions shown are after removal of overlapping IGG regions. BW Martin GeneName CDS_start CDS_end EnsemblID ClusterStart = beginning coordinate on the chromosome of the cluster ClusterEnd = ending coordinate on the chromosome of the cluster *ClusterID = unique ID for the cluster *ClusterSequence = sequence of the cluster *ReadCount = number of reads that overlap the cluster by at least 1 nucleotide *ModeLocation = coordinate of the location with the highest signal / (signal + background) value *ConversionLocationCount = number of unique location where at least 1 conversion occurred *ConversionEventCount = total number of conversions that occurred within the cluster *NonConversionEventCount = total number of possible conversion events that did not occur *ModeScore = score of the highest signal / (signal + background) value *AvgConversionPct = average conversion % of all conversions in the group containing the cluster *GroupConversionEventCount = number of all conversions in the group containing the cluster MaxConversionPct = max. conversion % of all conversions in the group containing the cluster Strand = orientation in which the cluster resides Fields marked with * are merged with _ GeneName CDS_start CDS_end EnsemblID ClusterStart = beginning coordinate on the chromosome of the cluster ClusterEnd = ending coordinate on the chromosome of the cluster *ClusterID = unique ID for the cluster *ClusterSequence = sequence of the cluster *ReadCount = number of reads that overlap the cluster by at least 1 nucleotide *ModeLocation = coordinate of the location with the highest signal / (signal + background) value *ConversionLocationCount = number of unique location where at least 1 conversion occurred *ConversionEventCount = total number of conversions that occurred within the cluster *NonConversionEventCount = total number of possible conversion events that did not occur ModeScore = score of the highest signal / (signal + background) value Strand = orientation in which the cluster resides ConversionEventCount/(ConversionEventCount+NonConversionEventCount) Fields marked with * are merged with _ Example 0610007P08Rik 156 4767 ENSMUST00000039944 5285 5311 G12910.1_TGGAATTATGTTCTATTACTATTAAAT_28_5312_4_15_278 0.943376255282776 + GeneName = 0610007P08Rik CDS_start = 156 CDS_end = 4767 EnsemblID = ENSMUST00000039944 ClusterStart = 5285 beginning coordinate on the chromosome of the cluster ClusterEnd = 5311 ending coordinate on the chromosome of the cluster *ClusterID = G12910.1 *ClusterSequence = TGGAATTATGTTCTATTACTATTAAAT *ReadCount = 28 *ModeLocation = 5312 *ConversionLocationCount = 4 *ConversionEventCount = 15 *NonConversionEventCount = 278 ModeScore = 0.943376255282776 Strand = + #@ Dear Yiota at http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_gt5tc_gt0.25TtoC are the results as discussed. The requirement in a group is that the max. TtoC conversion pct should be >= 0.25. Using the average over the whole group is too strict. This means there should be at least one stron T>C. BW, Martin Dear Yiota at http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings_gt5tc are the results as discussed. In the *distributions* files, I include only the lines with ConversionPercent as each file is already ~500MB (the complete info if >4GB per sample). The locations with -1 are non-T. The constant 12 for the NonConversionEventCount was my fault, the affected 3 previous files http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IFN.txt2.csv.bed.noIGG.plus.anno.csv http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IL4.txt2.csv.bed.noIGG.plus.anno.csv http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/paralyzer/PARalyzer_v1_1b/res/recommended_settings/sh-clusters-IGG.txt2.csv.bed.plus.anno.csv have been corrected. BW Martin ENSMUST00000040459 gene=Mapkap1 CDS=373-1926 ENSMUST00000113123 gene=Mapkap1 CDS=522-1512 ENSMUST00000113124 gene=Mapkap1 CDS=282-1740 ENSMUST00000113126 gene=Mapkap1 CDS=48-1614 ENSMUST00000113129 gene=Mapkap1 CDS=447-1905 ENSMUST00000124443 gene=Mapkap1 CDS=711-1630 ENSMUST00000147337 gene=Mapkap1 CDS=142-1708 Dear Dimitris, please see the attached IGV sessions for all isoforms of Mapkap1. The coverage is made of reads that contain at least one TtoC conversion. Location and Percentage of TtoC conversions are visisible in locations that have a blue bar at the bottom (amount of C) and a red bar at the top (amount of T). Note the various scales indicated at he top left of each track. Best wishes, Martin PS: In case you'd like to view this in IGV, in the folder at http://genomics-lab.fleming.gr/fleming/DKlab/mr/parclip/shrimp/mapkap1 the files are: 0hrep1-15mMm-mapkap1.sam.bam.tc.bam 0hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai 0hrep2-15mMm-mapkap1.sam.bam.tc.bam 0hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai 0hrep3-15mMm-mapkap1.sam.bam.tc.bam 0hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai 2hrep1-15mMm-mapkap1.sam.bam.tc.bam 2hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai 2hrep2-15mMm-mapkap1.sam.bam.tc.bam 2hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai 2hrep3-15mMm-mapkap1.sam.bam.tc.bam 2hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai 6hrep1-15mMm-mapkap1.sam.bam.tc.bam 6hrep1-15mMm-mapkap1.sam.bam.tc.bam.bai 6hrep2-15mMm-mapkap1.sam.bam.tc.bam 6hrep2-15mMm-mapkap1.sam.bam.tc.bam.bai 6hrep3-15mMm-mapkap1.sam.bam.tc.bam 6hrep3-15mMm-mapkap1.sam.bam.tc.bam.bai IFN-15mMm-mapkap1.sam.bam.tc.bam IFN-15mMm-mapkap1.sam.bam.tc.bam.bai IGG-15mMm-mapkap1.sam.bam.tc.bam IGG-15mMm-mapkap1.sam.bam.tc.bam.bai IL4-15mMm-mapkap1.sam.bam.tc.bam IL4-15mMm-mapkap1.sam.bam.tc.bam.bai