DiBiG
ICBR BioinformaticsPowered by Actor, v1.0

ChIPseq - Alignment and peak finding

Title: NS3104
Project: (none)
Started on: 5/11/2023 12:47:07
Hostname: login1.ufhpc
Run directory: /blue/licht/runs/Evans-MDS/NS3104/NS3104
Configuration NS3104.conf
Table of contents:
  1. Input data
  2. Trimming and quality control
  3. Mapping to genome
  4. Genome coverage
  5. Peak detection
  6. Fraction of Reads in Peaks
  7. Differential peak analysis
  8. MultiQC report
  9. UCSC hub
1. Input data
The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).

CategoryData
Summary of input data
Experimental conditions:32D-PAR-aF-poly, 32D-Dnmt3a-F-aF-poly, 32D-PAR-aF-mono, 32D-Dnmt3a-F-aF-mono, 32D-PAR-aD-mono, 32D-Dnmt3a-F-aD-mono
Contrasts:32D-Dnmt3a-F-aF-poly vs. 32D-PAR-aF-poly, 32D-Dnmt3a-F-aF-mono vs. 32D-PAR-aF-mono, 32D-Dnmt3a-F-aD-mono vs. 32D-PAR-aD-mono
Number of samples8
Sequencing data data
Total number of reads:1,750,090,567
Average reads per sample:218,761,320
Table 1. Summary of input data



ConditionSampleNumber of reads% Reads
32D-PAR-aF-polysP1297,179,91016.98%
32D-Dnmt3a-F-aF-polysF1269,341,25715.39%
32D-PAR-aF-monosP2146,262,5918.36%
32D-Dnmt3a-F-aF-monosF2439,316,91125.10%
32D-PAR-aD-monosP3334,920,80019.14%
32D-Dnmt3a-F-aD-monosF3216,933,22212.40%
Table 2. Number of reads in each sample.

2. Trimming and quality control
The input sequences were trimmed using trimmomatic. Quality control was performed before and after trimming using FastQC. The following table provides links to the quality control reports before and after trimming, as well as the number of reads in the trimmed files.

SampleReadsetReads before trimQC before trimReads after trimQC after trim% Retained
sP1sP1_r1297,179,910P1_S1_L004_R1_001
P1_S1_L004_R2_001
281,949,386P1_S1_L004_R1_001.trim.paired
P1_S1_L004_R2_001.trim.paired
94.87%
sP4sP4_r126,321,380P4_S4_L004_R1_001
P4_S4_L004_R2_001
25,037,344P4_S4_L004_R1_001.trim.paired
P4_S4_L004_R2_001.trim.paired
95.12%
sF1sF1_r1269,341,257F1_S6_L004_R1_001
F1_S6_L004_R2_001
256,874,548F1_S6_L004_R1_001.trim.paired
F1_S6_L004_R2_001.trim.paired
95.37%
sF4sF4_r119,814,496F4_S9_L004_R1_001
F4_S9_L004_R2_001
18,906,926F4_S9_L004_R1_001.trim.paired
F4_S9_L004_R2_001.trim.paired
95.42%
sP2sP2_r1146,262,591P2_S2_L004_R1_001
P2_S2_L004_R2_001
139,329,657P2_S2_L004_R1_001.trim.paired
P2_S2_L004_R2_001.trim.paired
95.26%
sF2sF2_r1439,316,911F2_S7_L004_R1_001
F2_S7_L004_R2_001
416,335,122F2_S7_L004_R1_001.trim.paired
F2_S7_L004_R2_001.trim.paired
94.77%
sP3sP3_r1334,920,800P3_S3_L004_R1_001
P3_S3_L004_R2_001
317,190,988P3_S3_L004_R1_001.trim.paired
P3_S3_L004_R2_001.trim.paired
94.71%
sF3sF3_r1216,933,222F3_S8_L004_R1_001
F3_S8_L004_R2_001
207,193,927F3_S8_L004_R1_001.trim.paired
F3_S8_L004_R2_001.trim.paired
95.51%
Table 3. Number of reads in input files and links to QC reports.

The following two tables report the number of reads before and after QC in each sample and in each condition.

SampleReads before QCReads after QC% Retained
sP1297,179,910281,949,38694.87%
sP426,321,38025,037,34495.12%
sF1269,341,257256,874,54895.37%
sF419,814,49618,906,92695.42%
sP2146,262,591139,329,65795.26%
sF2439,316,911416,335,12294.77%
sP3334,920,800317,190,98894.71%
sF3216,933,222207,193,92795.51%
Table 4. Number of reads in each sample before and after QC.



ConditionReads before QCReads after QC% Retained
32D-PAR-aF-poly297,179,910281,949,38694.87%
32D-Dnmt3a-F-aF-poly269,341,257256,874,54895.37%
32D-PAR-aF-mono146,262,591139,329,65795.26%
32D-Dnmt3a-F-aF-mono439,316,911416,335,12294.77%
32D-PAR-aD-mono334,920,800317,190,98894.71%
32D-Dnmt3a-F-aD-mono216,933,222207,193,92795.51%
Table 5. Number of reads in each condition before and after QC.

3. Mapping to genome
The input sequences were aligned to the genome using Bowtie 2.4.5. The following table reports the number of aligned reads for each sample. The WIG files can be uploaded to the UCSC Genome Browser as custom tracks.

SampleTotal readsAligned readsConcordant alignment rateBowtie2 report
sP1281,949,386207,042,12673.43%bam.bowtie/sP1.bt2stats.html
sP425,037,34418,734,76674.83%bam.bowtie/sP4.bt2stats.html
sF1256,874,548185,183,42272.09%bam.bowtie/sF1.bt2stats.html
sF418,906,92613,797,25672.97%bam.bowtie/sF4.bt2stats.html
sP2139,329,65799,997,87271.77%bam.bowtie/sP2.bt2stats.html
sF2416,335,122321,392,35677.20%bam.bowtie/sF2.bt2stats.html
sP3317,190,988240,576,23475.85%bam.bowtie/sP3.bt2stats.html
sF3207,193,927154,088,77674.37%bam.bowtie/sF3.bt2stats.html
Table 6. Number of alignments to genome.

4. Genome coverage
The following table reports the overall and effective genome coverage in each sample. The Total nt column reports the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of the genome.

NameTotal ntCoverageEffective bpEffective PercEff Coverage
sP112,889,018,6134.731,220,810,22444.80%10.56
sP4122,926,8040.056,180,5200.20%19.89
sF145,014,437,94416.502,342,422,58185.90%19.22
sF4111,786,7530.043,445,0590.10%32.45
sP22,450,053,2470.90249,698,3199.20%9.81
sF212,188,428,8384.471,022,188,97137.50%11.92
sP38,931,874,5283.27842,626,29230.90%10.60
sF310,150,853,6753.721,007,442,69536.90%10.08
Table 7. Genome coverage by sample.

The following table reports the overall and effective genome coverage in each condition.

NameTotal ntCoverageEffective bpEffective PercEff Coverage
32D-PAR-aF-poly12,889,018,6134.731,220,810,22444.80%10.56
32D-Dnmt3a-F-aF-poly45,014,437,94416.502,342,422,58185.90%19.22
32D-PAR-aF-mono2,450,053,2470.90249,698,3199.20%9.81
32D-Dnmt3a-F-aF-mono00.0000.00%0.00
32D-PAR-aD-mono8,931,874,5283.27842,626,29230.90%10.60
32D-Dnmt3a-F-aD-mono10,150,853,6753.721,007,442,69536.90%10.08
Table 8. Genome coverage by condition

File: NS3104.sample.cov.xlsx
Size: 40.50 kB
Description: Per-chromosome coverage data, by sample.

File: NS3104.cond.cov.xlsx
Size: 28.90 kB
Description: Per-chromosome coverage data, by condition.

5. Peak detection
Peak detection was performed using MACS version 2.2.7.1 with the following options: broad=N, model=Y, paired=Y, qvalue=0.05. The following table shows the number of peaks found for each condition, and their classification. Click on the link in the Peaks column to download the list of peaks in tab-delimited format.

ConditionTotal PeaksPeaksActions
32D-PAR-aF-poly1,61932D-PAR-aF-poly.macs/32D-PAR-aF-poly_peaks.csvCreate RegionSet
32D-Dnmt3a-F-aF-poly81632D-Dnmt3a-F-aF-poly.macs/32D-Dnmt3a-F-aF-poly_peaks.csvCreate RegionSet
32D-PAR-aF-mono4,65132D-PAR-aF-mono.macs/32D-PAR-aF-mono_peaks.csvCreate RegionSet
32D-Dnmt3a-F-aF-mono3,79032D-Dnmt3a-F-aF-mono.macs/32D-Dnmt3a-F-aF-mono_peaks.csvCreate RegionSet
32D-PAR-aD-mono7,33932D-PAR-aD-mono.macs/32D-PAR-aD-mono_peaks.csvCreate RegionSet
32D-Dnmt3a-F-aD-mono1,61132D-Dnmt3a-F-aD-mono.macs/32D-Dnmt3a-F-aD-mono_peaks.csvCreate RegionSet
Table 9. Classification of peaks in genome regions

The following table provides links to the Pileup, narrowPeaks, and Summits files for each condition. All files are in bedGraph format.

ConditionNum peaksPileupPeaksSummits
32D-PAR-aF-poly1,61932D-PAR-aF-poly.macs/32D-PAR-aF-poly.bedGraph32D-PAR-aF-poly.macs/32D-PAR-aF-poly.npeaks.bedGraph32D-PAR-aF-poly.macs/32D-PAR-aF-poly.summits.bedGraph
32D-Dnmt3a-F-aF-poly81632D-Dnmt3a-F-aF-poly.macs/32D-Dnmt3a-F-aF-poly.bedGraph32D-Dnmt3a-F-aF-poly.macs/32D-Dnmt3a-F-aF-poly.npeaks.bedGraph32D-Dnmt3a-F-aF-poly.macs/32D-Dnmt3a-F-aF-poly.summits.bedGraph
32D-PAR-aF-mono4,65132D-PAR-aF-mono.macs/32D-PAR-aF-mono.bedGraph32D-PAR-aF-mono.macs/32D-PAR-aF-mono.npeaks.bedGraph32D-PAR-aF-mono.macs/32D-PAR-aF-mono.summits.bedGraph
32D-Dnmt3a-F-aF-mono3,79032D-Dnmt3a-F-aF-mono.macs/32D-Dnmt3a-F-aF-mono.bedGraph32D-Dnmt3a-F-aF-mono.macs/32D-Dnmt3a-F-aF-mono.npeaks.bedGraph32D-Dnmt3a-F-aF-mono.macs/32D-Dnmt3a-F-aF-mono.summits.bedGraph
32D-PAR-aD-mono7,33932D-PAR-aD-mono.macs/32D-PAR-aD-mono.bedGraph32D-PAR-aD-mono.macs/32D-PAR-aD-mono.npeaks.bedGraph32D-PAR-aD-mono.macs/32D-PAR-aD-mono.summits.bedGraph
32D-Dnmt3a-F-aD-mono1,61132D-Dnmt3a-F-aD-mono.macs/32D-Dnmt3a-F-aD-mono.bedGraph32D-Dnmt3a-F-aD-mono.macs/32D-Dnmt3a-F-aD-mono.npeaks.bedGraph32D-Dnmt3a-F-aD-mono.macs/32D-Dnmt3a-F-aD-mono.summits.bedGraph
Table 10. Results of peak detection with MACS.

The following histogram shows the distribution of peak locations in the different conditions.
6. Fraction of Reads in Peaks
The Fraction of Reads in Peaks (FRIP) is the fraction of reads that fall in regions called as peaks, out of all aligned peaks.

ConditionReadsReads in peaksFRIP
32D-PAR-aF-poly552,580,53851,876,4869.39%
32D-Dnmt3a-F-aF-poly505,665,18260,729,49912.01%
32D-PAR-aF-mono268,046,85026,768,3749.99%
32D-Dnmt3a-F-aF-mono823,336,65781,772,8399.93%
32D-PAR-aD-mono625,916,80059,821,9489.56%
32D-Dnmt3a-F-aD-mono408,764,89842,862,53510.49%
Table 11. Fraction of Reads in Peaks

7. Differential peak analysis
For each contrast, peaks in the two conditions were compared to identify those appearing in only the test or the control, or those significantly higher in the test than in the control (abs(log2(FC)) > 1). The following table reports the number of peaks identified in each group.

TestControlTest upControl upUnchanged
32D-Dnmt3a-F-aF-poly32D-PAR-aF-poly32D-Dnmt3a-F-aF-poly.vs.32D-PAR-aF-poly.testup.bed (138)32D-Dnmt3a-F-aF-poly.vs.32D-PAR-aF-poly.ctrlup.bed (49)32D-Dnmt3a-F-aF-poly.vs.32D-PAR-aF-poly.commpeaks.bed (816)
32D-Dnmt3a-F-aF-mono32D-PAR-aF-mono32D-Dnmt3a-F-aF-mono.vs.32D-PAR-aF-mono.testup.bed (1,726)32D-Dnmt3a-F-aF-mono.vs.32D-PAR-aF-mono.ctrlup.bed (390)32D-Dnmt3a-F-aF-mono.vs.32D-PAR-aF-mono.commpeaks.bed (3,790)
32D-Dnmt3a-F-aD-mono32D-PAR-aD-mono32D-Dnmt3a-F-aD-mono.vs.32D-PAR-aD-mono.testup.bed (210)32D-Dnmt3a-F-aD-mono.vs.32D-PAR-aD-mono.ctrlup.bed (183)32D-Dnmt3a-F-aD-mono.vs.32D-PAR-aD-mono.commpeaks.bed (1,611)
Table 12. Number of differentially expressed peaks in each contrast.

The following table reports the classification of the number of differentially expressed peaks by gene region in all contrasts.

TestControlGroupUpstreamExonCodingExonIntronDownstreamIntergenic
32D-Dnmt3a-F-aF-poly32D-PAR-aF-polyTest up6.32%3.45%1.15%60.92%4.60%23.56%
32D-Dnmt3a-F-aF-poly32D-PAR-aF-polyControl up1.56%1.56%0.00%56.25%7.81%32.81%
32D-Dnmt3a-F-aF-poly32D-PAR-aF-polyCommon7.22%2.50%0.97%40.69%8.47%40.14%
32D-Dnmt3a-F-aF-mono32D-PAR-aF-monoTest up4.46%3.24%1.60%62.73%6.72%21.24%
32D-Dnmt3a-F-aF-mono32D-PAR-aF-monoControl up4.97%1.08%0.43%32.83%5.18%55.51%
32D-Dnmt3a-F-aF-mono32D-PAR-aF-monoCommon5.19%1.89%1.42%41.09%7.67%42.74%
32D-Dnmt3a-F-aD-mono32D-PAR-aD-monoTest up4.85%2.61%1.12%45.52%6.72%39.18%
32D-Dnmt3a-F-aD-mono32D-PAR-aD-monoControl up3.33%0.48%0.48%44.76%4.29%46.67%
32D-Dnmt3a-F-aD-mono32D-PAR-aD-monoCommon5.59%2.87%1.47%46.16%6.75%37.16%
Table 13. Location of differential peaks for each contrast.

8. MultiQC report
MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report on this analysis (generated using MultiQC version 1.12) is available here:

MultiQC report
9. UCSC hub

UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://bw:bw@lichtlab.cancer.ufl.edu/reports/MDS//NS3104/NS3104/hub.txt and paste it into the "My Hubs" form in this page.

WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://bw:bw@lichtlab.cancer.ufl.edu/reports/MDS//NS3104/NS3104/hub.json.



Completed: 5-11-2023@12:47
© 2023, A. Riva, University of Florida.