DiBiG
ICBR BioinformaticsPowered by Actor, v1.0

Cut&Run - Alignment and peak finding

Title: EVM_CR_2023-09-18
Project: (none)
Started on: 9/29/2023 11:09:13
Hostname: login7.ufhpc
Run directory: /blue/licht/runs/Evans-MDS/NS3371/EVM_CR_2023-09-18
Configuration NS3371.conf
Table of contents:
  1. Input data
  2. Trimming and quality control
  3. Mapping to genome
  4. Spike-ins normalization
  5. Genome coverage
  6. Peak detection
  7. Fraction of Reads in Peaks
  8. Differential peak detection
  9. MultiQC report
  10. UCSC hub
1. Input data
The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).

CategoryData
Summary of input data
Reference genome:mm10
Experimental conditions:S-32D-Dnmt3a-FLAG-aIgG, S-32D-Parental-aFLAG, S-32D-Dnmt3a-FLAG-aFLAG, S-32D-Dnmt3a-FLAG-aDnmt3a, S-32D-Dnmt3a-FLAG-aH3K4me3
Contrasts:S-32D-Parental-aFLAG vs. S-32D-Dnmt3a-FLAG-aIgG, S-32D-Dnmt3a-FLAG-aDnmt3a vs. S-32D-Dnmt3a-FLAG-aIgG, S-32D-Dnmt3a-FLAG-aH3K4me3 vs. S-32D-Dnmt3a-FLAG-aIgG, S-32D-Dnmt3a-FLAG-aFLAG vs. S-32D-Dnmt3a-FLAG-aIgG
Number of samples10
Sequencing data data
Total number of reads:432,734,546
Average reads per sample:43,273,454
Table 1. Summary of input data



ConditionSampleNumber of reads% Reads
S-32D-Dnmt3a-FLAG-aIgG32D-Dnmt3a-FLAG-aIgG-143,395,51410.03%
32D-Dnmt3a-FLAG-aIgG-237,701,7598.71%
S-32D-Parental-aFLAG32D-Parental-aFLAG-149,069,20311.34%
32D-Parental-aFLAG-240,958,1099.46%
S-32D-Dnmt3a-FLAG-aFLAG32D-Dnmt3a-FLAG-aFLAG-143,221,0909.99%
32D-Dnmt3a-FLAG-aFLAG-239,213,4799.06%
S-32D-Dnmt3a-FLAG-aDnmt3a32D-Dnmt3a-FLAG-aDnmt3a-143,334,19610.01%
32D-Dnmt3a-FLAG-aDnmt3a-240,040,6389.25%
S-32D-Dnmt3a-FLAG-aH3K4me332D-Dnmt3a-FLAG-aH3K4me3-148,663,51711.25%
32D-Dnmt3a-FLAG-aH3K4me3-247,137,04110.89%
Table 2. Number of reads in each sample.

2. Trimming and quality control
The input sequences were trimmed using trimmomatic. Quality control was performed before and after trimming using FastQC. The following table provides links to the quality control reports before and after trimming, as well as the number of reads in the trimmed files.

SampleReadsetReads before trimQC before trimReads after trimQC after trim% Retained
32D-Dnmt3a-FLAG-aIgG-132D-Dnmt3a-FLAG-aIgG-1_r143,395,51432D-Dnmt3a-FLAG-aIgG-1_S7_L004_R1_001
32D-Dnmt3a-FLAG-aIgG-1_S7_L004_R2_001
41,838,32232D-Dnmt3a-FLAG-aIgG-1_S7_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aIgG-1_S7_L004_R2_001.trim.paired
96.41%
32D-Dnmt3a-FLAG-aIgG-232D-Dnmt3a-FLAG-aIgG-2_r137,701,75932D-Dnmt3a-FLAG-aIgG-2_S8_L004_R1_001
32D-Dnmt3a-FLAG-aIgG-2_S8_L004_R2_001
36,392,18932D-Dnmt3a-FLAG-aIgG-2_S8_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aIgG-2_S8_L004_R2_001.trim.paired
96.53%
32D-Parental-aFLAG-132D-Parental-aFLAG-1_r149,069,20332D-Parental-aFLAG-1_S9_L004_R1_001
32D-Parental-aFLAG-1_S9_L004_R2_001
46,982,55532D-Parental-aFLAG-1_S9_L004_R1_001.trim.paired
32D-Parental-aFLAG-1_S9_L004_R2_001.trim.paired
95.75%
32D-Parental-aFLAG-232D-Parental-aFLAG-2_r140,958,10932D-Parental-aFLAG-2_S10_L004_R1_001
32D-Parental-aFLAG-2_S10_L004_R2_001
39,299,19132D-Parental-aFLAG-2_S10_L004_R1_001.trim.paired
32D-Parental-aFLAG-2_S10_L004_R2_001.trim.paired
95.95%
32D-Dnmt3a-FLAG-aFLAG-132D-Dnmt3a-FLAG-aFLAG-1_r143,221,09032D-Dnmt3a-FLAG-aFLAG-1_S1_L004_R1_001
32D-Dnmt3a-FLAG-aFLAG-1_S1_L004_R2_001
41,695,65032D-Dnmt3a-FLAG-aFLAG-1_S1_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aFLAG-1_S1_L004_R2_001.trim.paired
96.47%
32D-Dnmt3a-FLAG-aFLAG-232D-Dnmt3a-FLAG-aFLAG-2_r139,213,47932D-Dnmt3a-FLAG-aFLAG-2_S2_L004_R1_001
32D-Dnmt3a-FLAG-aFLAG-2_S2_L004_R2_001
37,831,04832D-Dnmt3a-FLAG-aFLAG-2_S2_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aFLAG-2_S2_L004_R2_001.trim.paired
96.47%
32D-Dnmt3a-FLAG-aDnmt3a-132D-Dnmt3a-FLAG-aDnmt3a-1_r143,334,19632D-Dnmt3a-FLAG-aDnmt3a-1_S3_L004_R1_001
32D-Dnmt3a-FLAG-aDnmt3a-1_S3_L004_R2_001
41,727,88432D-Dnmt3a-FLAG-aDnmt3a-1_S3_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aDnmt3a-1_S3_L004_R2_001.trim.paired
96.29%
32D-Dnmt3a-FLAG-aDnmt3a-232D-Dnmt3a-FLAG-aDnmt3a-2_r140,040,63832D-Dnmt3a-FLAG-aDnmt3a-2_S4_L004_R1_001
32D-Dnmt3a-FLAG-aDnmt3a-2_S4_L004_R2_001
38,605,74432D-Dnmt3a-FLAG-aDnmt3a-2_S4_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aDnmt3a-2_S4_L004_R2_001.trim.paired
96.42%
32D-Dnmt3a-FLAG-aH3K4me3-132D-Dnmt3a-FLAG-aH3K4me3-1_r148,663,51732D-Dnmt3a-FLAG-aH3K4me3-1_S5_L004_R1_001
32D-Dnmt3a-FLAG-aH3K4me3-1_S5_L004_R2_001
46,662,66532D-Dnmt3a-FLAG-aH3K4me3-1_S5_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aH3K4me3-1_S5_L004_R2_001.trim.paired
95.89%
32D-Dnmt3a-FLAG-aH3K4me3-232D-Dnmt3a-FLAG-aH3K4me3-2_r147,137,04132D-Dnmt3a-FLAG-aH3K4me3-2_S6_L004_R1_001
32D-Dnmt3a-FLAG-aH3K4me3-2_S6_L004_R2_001
45,119,51132D-Dnmt3a-FLAG-aH3K4me3-2_S6_L004_R1_001.trim.paired
32D-Dnmt3a-FLAG-aH3K4me3-2_S6_L004_R2_001.trim.paired
95.72%
Table 3. Number of reads in input files and links to QC reports.

The following two tables report the number of reads before and after QC in each sample and in each condition.

SampleReads before QCReads after QC% Retained
32D-Dnmt3a-FLAG-aIgG-143,395,51441,838,32296.41%
32D-Dnmt3a-FLAG-aIgG-237,701,75936,392,18996.53%
32D-Parental-aFLAG-149,069,20346,982,55595.75%
32D-Parental-aFLAG-240,958,10939,299,19195.95%
32D-Dnmt3a-FLAG-aFLAG-143,221,09041,695,65096.47%
32D-Dnmt3a-FLAG-aFLAG-239,213,47937,831,04896.47%
32D-Dnmt3a-FLAG-aDnmt3a-143,334,19641,727,88496.29%
32D-Dnmt3a-FLAG-aDnmt3a-240,040,63838,605,74496.42%
32D-Dnmt3a-FLAG-aH3K4me3-148,663,51746,662,66595.89%
32D-Dnmt3a-FLAG-aH3K4me3-247,137,04145,119,51195.72%
Table 4. Number of reads in each sample before and after QC.



ConditionReads before QCReads after QC% Retained
S-32D-Dnmt3a-FLAG-aIgG81,097,27378,230,51196.47%
S-32D-Parental-aFLAG90,027,31286,281,74695.84%
S-32D-Dnmt3a-FLAG-aFLAG82,434,56979,526,69896.47%
S-32D-Dnmt3a-FLAG-aDnmt3a83,374,83480,333,62896.35%
S-32D-Dnmt3a-FLAG-aH3K4me395,800,55891,782,17695.81%
Table 5. Number of reads in each condition before and after QC.

3. Mapping to genome
The input sequences were aligned to the mm10 genome using Bowtie 2.4.5. The following table reports the number of aligned reads for each sample. The WIG files can be uploaded to the UCSC Genome Browser as custom tracks.

SampleTotal readsAligned readsConcordant alignment rateBowtie2 report
32D-Dnmt3a-FLAG-aIgG-141,838,32226,787,43964.03%bam.bowtie/32D-Dnmt3a-FLAG-aIgG-1.bt2stats.html
32D-Dnmt3a-FLAG-aIgG-236,392,18922,694,07162.36%bam.bowtie/32D-Dnmt3a-FLAG-aIgG-2.bt2stats.html
32D-Parental-aFLAG-146,982,55531,290,74266.60%bam.bowtie/32D-Parental-aFLAG-1.bt2stats.html
32D-Parental-aFLAG-239,299,19126,598,87167.68%bam.bowtie/32D-Parental-aFLAG-2.bt2stats.html
32D-Dnmt3a-FLAG-aFLAG-141,695,65027,219,01065.28%bam.bowtie/32D-Dnmt3a-FLAG-aFLAG-1.bt2stats.html
32D-Dnmt3a-FLAG-aFLAG-237,831,04824,977,31866.02%bam.bowtie/32D-Dnmt3a-FLAG-aFLAG-2.bt2stats.html
32D-Dnmt3a-FLAG-aDnmt3a-141,727,88427,750,40366.50%bam.bowtie/32D-Dnmt3a-FLAG-aDnmt3a-1.bt2stats.html
32D-Dnmt3a-FLAG-aDnmt3a-238,605,74425,987,92767.32%bam.bowtie/32D-Dnmt3a-FLAG-aDnmt3a-2.bt2stats.html
32D-Dnmt3a-FLAG-aH3K4me3-146,662,66537,548,99880.47%bam.bowtie/32D-Dnmt3a-FLAG-aH3K4me3-1.bt2stats.html
32D-Dnmt3a-FLAG-aH3K4me3-245,119,51135,724,71779.18%bam.bowtie/32D-Dnmt3a-FLAG-aH3K4me3-2.bt2stats.html
Table 6. Number of alignments to genome.

4. Spike-ins normalization
The following table shows the values used to normalize samples on the basis of Spike-ins. The Spike-in reads column contains the number of reads aligning to the genome of the spike-ins, and Spike-in % shows what fraction this represents of the total number of reads. The Scale factor column contains the ratio between the number of Spike-in reads for each sample and the smallest value in that column. This factor is used to scale the input datasets, in order to obtain the same number of spike-in reads in each sample.

SampleTotal readsSpike-in readsSpike-in %Scale factor
32D-Dnmt3a-FLAG-aIgG-141,838,3221,304,5303.120.39
32D-Dnmt3a-FLAG-aIgG-236,392,1891,106,4073.040.46
32D-Parental-aFLAG-146,982,5552,277,3334.850.22
32D-Parental-aFLAG-239,299,1911,927,0834.900.27
32D-Dnmt3a-FLAG-aFLAG-141,695,650939,2852.250.54
32D-Dnmt3a-FLAG-aFLAG-237,831,0481,010,0372.670.51
32D-Dnmt3a-FLAG-aDnmt3a-141,727,884917,5562.200.56
32D-Dnmt3a-FLAG-aDnmt3a-238,605,744862,8802.240.59
32D-Dnmt3a-FLAG-aH3K4me3-146,662,665521,9421.120.98
32D-Dnmt3a-FLAG-aH3K4me3-245,119,511511,7621.131.00
Table 7. Spike-in normalization results

5. Genome coverage
The following table reports the overall and effective genome coverage in each sample. The Total nt column reports the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of the genome.

NameTotal ntCoverageEffective bpEffective PercEff Coverage
32D-Dnmt3a-FLAG-aIgG-12,170,820,2210.80257,493,8219.40%8.43
32D-Dnmt3a-FLAG-aIgG-22,116,262,6950.78251,242,8219.20%8.42
32D-Parental-aFLAG-11,188,692,8750.44138,711,4085.10%8.57
32D-Parental-aFLAG-21,260,180,4160.46150,965,5265.50%8.35
32D-Dnmt3a-FLAG-aFLAG-13,521,709,1251.29364,515,62213.40%9.66
32D-Dnmt3a-FLAG-aFLAG-23,138,187,3431.15334,042,26012.20%9.39
32D-Dnmt3a-FLAG-aDnmt3a-13,911,520,2541.43411,195,94215.10%9.51
32D-Dnmt3a-FLAG-aDnmt3a-23,145,046,0781.15345,646,49712.70%9.10
32D-Dnmt3a-FLAG-aH3K4me3-15,053,988,7221.8574,196,8242.70%68.12
32D-Dnmt3a-FLAG-aH3K4me3-24,519,845,2271.6681,200,8483.00%55.66
Table 8. Genome coverage by sample.

The following table reports the overall and effective genome coverage in each condition.

NameTotal ntCoverageEffective bpEffective PercEff Coverage
S-32D-Dnmt3a-FLAG-aIgG8,255,232,8403.03879,173,07932.20%9.39
S-32D-Parental-aFLAG5,369,184,4791.97607,774,51322.30%8.83
S-32D-Dnmt3a-FLAG-aFLAG10,114,977,8013.71885,318,26032.50%11.43
S-32D-Dnmt3a-FLAG-aDnmt3a10,873,033,1713.99967,752,45735.50%11.24
S-32D-Dnmt3a-FLAG-aH3K4me310,575,118,5433.88244,072,6989.00%43.33
Table 9. Genome coverage by condition

File: EVM_CR_2023-09-18.sample.cov.xlsx
Size: 49.09 kB
Description: Per-chromosome coverage data, by sample.

File: EVM_CR_2023-09-18.cond.cov.xlsx
Size: 27.72 kB
Description: Per-chromosome coverage data, by condition.

6. Peak detection
Peak detection was performed using SEACR with the following options: mode=stringent, normalization=False. The following table shows the number of peaks found for each condition.

ConditionTotal PeaksPeaksActions
S-32D-Dnmt3a-FLAG-aIgG100,441S-32D-Dnmt3a-FLAG-aIgG.peaks.bedCreate RegionSet
S-32D-Parental-aFLAG88,416S-32D-Parental-aFLAG.peaks.bedCreate RegionSet
S-32D-Dnmt3a-FLAG-aFLAG91,041S-32D-Dnmt3a-FLAG-aFLAG.peaks.bedCreate RegionSet
S-32D-Dnmt3a-FLAG-aDnmt3a91,722S-32D-Dnmt3a-FLAG-aDnmt3a.peaks.bedCreate RegionSet
S-32D-Dnmt3a-FLAG-aH3K4me385,208S-32D-Dnmt3a-FLAG-aH3K4me3.peaks.bedCreate RegionSet
Table 10. Peaks detected by SEACR and their classification in genomic regions.

The following histogram shows the distribution of peak locations in the different conditions.
File: peak-classifications.xlsx
Size: 5.73 kB
Description: Table containing number of peaks in each region for each condition.

7. Fraction of Reads in Peaks
The Fraction of Reads in Peaks (FRIP) is the fraction of reads that fall in regions called as peaks, out of all aligned peaks.

ConditionReadsReads in peaksFRIP
S-32D-Dnmt3a-FLAG-aIgG137,161,31736,657,49526.73%
S-32D-Parental-aFLAG152,262,82839,887,64926.20%
S-32D-Dnmt3a-FLAG-aFLAG137,817,79161,510,36544.63%
S-32D-Dnmt3a-FLAG-aDnmt3a141,457,89260,759,82042.95%
S-32D-Dnmt3a-FLAG-aH3K4me3168,680,517123,980,42373.50%
Table 11. Fraction of Reads in Peaks

8. Differential peak detection
The following table reports peaks that are increased, decreased, or unchanged in each contrast. Note that these results are simply based on the fold change of the peak sizes in the two conditions, with no statistical significance information, using a Log2(FC) threshold of 1.0. More accurate differential analysis can be performed with DASA.

TestControlTotalIncreasedDecreasedUnchangedTable
S-32D-Parental-aFLAGS-32D-Dnmt3a-FLAG-aIgG62,85133%34%32%S-32D-Parental-aFLAG.vs.S-32D-Dnmt3a-FLAG-aIgG.xlsx
S-32D-Dnmt3a-FLAG-aDnmt3aS-32D-Dnmt3a-FLAG-aIgG66,00667%13%19%S-32D-Dnmt3a-FLAG-aDnmt3a.vs.S-32D-Dnmt3a-FLAG-aIgG.xlsx
S-32D-Dnmt3a-FLAG-aH3K4me3S-32D-Dnmt3a-FLAG-aIgG57,24728%58%13%S-32D-Dnmt3a-FLAG-aH3K4me3.vs.S-32D-Dnmt3a-FLAG-aIgG.xlsx
S-32D-Dnmt3a-FLAG-aFLAGS-32D-Dnmt3a-FLAG-aIgG63,78559%20%19%S-32D-Dnmt3a-FLAG-aFLAG.vs.S-32D-Dnmt3a-FLAG-aIgG.xlsx
Show values | percentages
Table 12. Differential peak analysis results.

The following table reports the number of increased and decreased peaks that belong to enhancers.

TestControlInc - PeaksInc - EnhancersDec - PeaksDec - EnhancersTable
S-32D-Parental-aFLAGS-32D-Dnmt3a-FLAG-aIgG2,1852,1202,2842,346S-32D-Parental-aFLAG.vs.S-32D-Dnmt3a-FLAG-aIgG.enhancers.xlsx
S-32D-Dnmt3a-FLAG-aDnmt3aS-32D-Dnmt3a-FLAG-aIgG5,2315,121760748S-32D-Dnmt3a-FLAG-aDnmt3a.vs.S-32D-Dnmt3a-FLAG-aIgG.enhancers.xlsx
S-32D-Dnmt3a-FLAG-aH3K4me3S-32D-Dnmt3a-FLAG-aIgG1,9622,2493,1382,485S-32D-Dnmt3a-FLAG-aH3K4me3.vs.S-32D-Dnmt3a-FLAG-aIgG.enhancers.xlsx
S-32D-Dnmt3a-FLAG-aFLAGS-32D-Dnmt3a-FLAG-aIgG4,4994,4511,3231,279S-32D-Dnmt3a-FLAG-aFLAG.vs.S-32D-Dnmt3a-FLAG-aIgG.enhancers.xlsx
Table 13. Differential peaks in enhancers.

9. MultiQC report
MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report on this analysis (generated using MultiQC version 1.12) is available here:

MultiQC report
10. UCSC hub

UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://bw:bw@lichtlab.cancer.ufl.edu/reports/MDS//EVM_CR_2023-09-18/NS3371/hub.txt and paste it into the "My Hubs" form in this page.

WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://bw:bw@lichtlab.cancer.ufl.edu/reports/MDS//EVM_CR_2023-09-18/NS3371/hub.json.



Completed: 9-29-2023@11:10
© 2023, A. Riva, University of Florida.