Title: PHD_CR_20230612
Project: (none)
Started on: 6/14/2023 12:48:36
Hostname: login1.ufhpc
Run directory: /blue/licht/runs/MuscoLabCollab/PHD_CR_20230612/PHD_CR_20230612
Configuration PHD_CR_20230612.conf
|
Table of contents:
- Input data
- Trimming and quality control
- Mapping to genome
- Genome coverage
- Peak detection
- Fraction of Reads in Peaks
- Differential peak detection
- MultiQC report
- UCSC hub
|
1. Input data
The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).
Category | Data |
Summary of input data |
Experimental conditions: | WT-H3K27me3, WT-H3K4me3, D1240A-H3K27me3, D1240A-H3K4me3, F1295A-H3K27me3, F1295A-H3K4me3, TKO-H3K27me3, TKO-H3K4me3 |
Contrasts: | WT-H3K27me3 vs. TKO-H3K27me3, D1240A-H3K27me3 vs. TKO-H3K27me3, F1295A-H3K27me3 vs. TKO-H3K27me3, WT-H3K4me3 vs. TKO-H3K4me3, D1240A-H3K4me3 vs. TKO-H3K4me3, F1295A-H3K4me3 vs. TKO-H3K4me3 |
Number of samples | 16 |
Sequencing data data |
Total number of reads: | 463,831,170 |
Average reads per sample: | 28,989,448 |
Table 1. Summary of input data
Condition | Sample | Number of reads | % Reads |
WT-H3K27me3 | WT-H3K27me3-Rep1 | 28,948,570 | 6.24% |
WT-H3K27me3-Rep2 | 28,508,992 | 6.15% |
WT-H3K4me3 | WT-H3K4me3-1 | 30,371,791 | 6.55% |
D1240A-H3K27me3 | D1240A-H3K27me3-Rep1 | 34,395,801 | 7.42% |
D1240A-H3K27me3-Rep2 | 23,925,283 | 5.16% |
D1240A-H3K4me3 | D1240A-H3K4me3-1 | 34,366,992 | 7.41% |
F1295A-H3K27me3 | F1295A-H3K27me3-Rep1 | 37,263,842 | 8.03% |
F1295A-H3K27me3-Rep2 | 25,093,312 | 5.41% |
F1295A-H3K4me3 | F1295A-H3K4me3-1 | 34,248,564 | 7.38% |
TKO-H3K27me3 | TKO-H3K27me3-Rep1 | 30,595,380 | 6.60% |
TKO-H3K27me3-Rep2 | 27,448,019 | 5.92% |
TKO-H3K4me3 | TKO-H3K4me3-1 | 15,375,986 | 3.31% |
Table 2. Number of reads in each sample.
|
2. Trimming and quality control
The input sequences were trimmed using trimmomatic. Quality control was performed before and after trimming using FastQC. The following table provides links to the
quality control reports before and after trimming, as well as the number of reads in the trimmed files.
Sample | Readset | Reads before trim | QC before trim | Reads after trim | QC after trim | % Retained |
WT-H3K27me3-Rep1 | WT-H3K27me3-Rep1_r1 | 28,948,570 | WT-H3K27me3-Rep1_S7_L004_R1_001 WT-H3K27me3-Rep1_S7_L004_R2_001 | 27,657,729 | WT-H3K27me3-Rep1_S7_L004_R1_001.trim.paired WT-H3K27me3-Rep1_S7_L004_R2_001.trim.paired | 95.54% |
WT-H3K27me3-Rep2 | WT-H3K27me3-Rep2_r1 | 28,508,992 | WT-H3K27me3-Rep2_S8_L004_R1_001 WT-H3K27me3-Rep2_S8_L004_R2_001 | 27,351,179 | WT-H3K27me3-Rep2_S8_L004_R1_001.trim.paired WT-H3K27me3-Rep2_S8_L004_R2_001.trim.paired | 95.94% |
WT-IgG | WT-IgG_r1 | 22,336,225 | WT-IgG_S5_L004_R1_001 WT-IgG_S5_L004_R2_001 | 20,663,896 | WT-IgG_S5_L004_R1_001.trim.paired WT-IgG_S5_L004_R2_001.trim.paired | 92.51% |
WT-H3K4me3-1 | WT-H3K4me3-1_r1 | 30,371,791 | WT-H3K4me3_S6_L004_R1_001 WT-H3K4me3_S6_L004_R2_001 | 28,257,095 | WT-H3K4me3_S6_L004_R1_001.trim.paired WT-H3K4me3_S6_L004_R2_001.trim.paired | 93.04% |
D1240A-H3K27me3-Rep1 | D1240A-H3K27me3-Rep1_r1 | 34,395,801 | D1240A-H3K27me3-Rep1_S11_L004_R1_001 D1240A-H3K27me3-Rep1_S11_L004_R2_001 | 32,979,741 | D1240A-H3K27me3-Rep1_S11_L004_R1_001.trim.paired D1240A-H3K27me3-Rep1_S11_L004_R2_001.trim.paired | 95.88% |
D1240A-H3K27me3-Rep2 | D1240A-H3K27me3-Rep2_r1 | 23,925,283 | D1240A-H3K27me3-Rep2_S12_L004_R1_001 D1240A-H3K27me3-Rep2_S12_L004_R2_001 | 22,937,682 | D1240A-H3K27me3-Rep2_S12_L004_R1_001.trim.paired D1240A-H3K27me3-Rep2_S12_L004_R2_001.trim.paired | 95.87% |
D1240A-IgG | D1240A-IgG_r1 | 34,113,426 | D1240A-IgG_S9_L004_R1_001 D1240A-IgG_S9_L004_R2_001 | 32,117,807 | D1240A-IgG_S9_L004_R1_001.trim.paired D1240A-IgG_S9_L004_R2_001.trim.paired | 94.15% |
D1240A-H3K4me3-1 | D1240A-H3K4me3-1_r1 | 34,366,992 | D1240A-H3K4me3_S10_L004_R1_001 D1240A-H3K4me3_S10_L004_R2_001 | 32,259,322 | D1240A-H3K4me3_S10_L004_R1_001.trim.paired D1240A-H3K4me3_S10_L004_R2_001.trim.paired | 93.87% |
F1295A-H3K27me3-Rep1 | F1295A-H3K27me3-Rep1_r1 | 37,263,842 | F1295A-H3K27me3-Rep1_S15_L004_R1_001 F1295A-H3K27me3-Rep1_S15_L004_R2_001 | 35,859,404 | F1295A-H3K27me3-Rep1_S15_L004_R1_001.trim.paired F1295A-H3K27me3-Rep1_S15_L004_R2_001.trim.paired | 96.23% |
F1295A-H3K27me3-Rep2 | F1295A-H3K27me3-Rep2_r1 | 25,093,312 | F1295A-H3K27me3-Rep2_S16_L004_R1_001 F1295A-H3K27me3-Rep2_S16_L004_R2_001 | 24,112,495 | F1295A-H3K27me3-Rep2_S16_L004_R1_001.trim.paired F1295A-H3K27me3-Rep2_S16_L004_R2_001.trim.paired | 96.09% |
F1295A-IgG | F1295A-IgG_r1 | 33,533,122 | F1295A-IgG_S13_L004_R1_001 F1295A-IgG_S13_L004_R2_001 | 32,048,165 | F1295A-IgG_S13_L004_R1_001.trim.paired F1295A-IgG_S13_L004_R2_001.trim.paired | 95.57% |
F1295A-H3K4me3-1 | F1295A-H3K4me3-1_r1 | 34,248,564 | F1295A-H3K4me3_S14_L004_R1_001 F1295A-H3K4me3_S14_L004_R2_001 | 32,386,835 | F1295A-H3K4me3_S14_L004_R1_001.trim.paired F1295A-H3K4me3_S14_L004_R2_001.trim.paired | 94.56% |
TKO-H3K27me3-Rep1 | TKO-H3K27me3-Rep1_r1 | 30,595,380 | TKO-H3K27me3-Rep1_S3_L004_R1_001 TKO-H3K27me3-Rep1_S3_L004_R2_001 | 29,187,050 | TKO-H3K27me3-Rep1_S3_L004_R1_001.trim.paired TKO-H3K27me3-Rep1_S3_L004_R2_001.trim.paired | 95.40% |
TKO-H3K27me3-Rep2 | TKO-H3K27me3-Rep2_r1 | 27,448,019 | TKO-H3K27me3-Rep2_S4_L004_R1_001 TKO-H3K27me3-Rep2_S4_L004_R2_001 | 26,271,898 | TKO-H3K27me3-Rep2_S4_L004_R1_001.trim.paired TKO-H3K27me3-Rep2_S4_L004_R2_001.trim.paired | 95.72% |
TKO-IgG | TKO-IgG_r1 | 23,305,865 | TKO-IgG_S1_L004_R1_001 TKO-IgG_S1_L004_R2_001 | 21,718,413 | TKO-IgG_S1_L004_R1_001.trim.paired TKO-IgG_S1_L004_R2_001.trim.paired | 93.19% |
TKO-H3K4me3-1 | TKO-H3K4me3-1_r1 | 15,375,986 | TKO-H3K4me3_S2_L004_R1_001 TKO-H3K4me3_S2_L004_R2_001 | 14,646,303 | TKO-H3K4me3_S2_L004_R1_001.trim.paired TKO-H3K4me3_S2_L004_R2_001.trim.paired | 95.25% |
Table 3. Number of reads in input files and links to QC reports.
The following two tables report the number of reads before and after QC in each sample and in each condition.
Sample | Reads before QC | Reads after QC | % Retained |
WT-H3K27me3-Rep1 | 28,948,570 | 27,657,729 | 95.54% |
WT-H3K27me3-Rep2 | 28,508,992 | 27,351,179 | 95.94% |
WT-IgG | 22,336,225 | 20,663,896 | 92.51% |
WT-H3K4me3-1 | 30,371,791 | 28,257,095 | 93.04% |
D1240A-H3K27me3-Rep1 | 34,395,801 | 32,979,741 | 95.88% |
D1240A-H3K27me3-Rep2 | 23,925,283 | 22,937,682 | 95.87% |
D1240A-IgG | 34,113,426 | 32,117,807 | 94.15% |
D1240A-H3K4me3-1 | 34,366,992 | 32,259,322 | 93.87% |
F1295A-H3K27me3-Rep1 | 37,263,842 | 35,859,404 | 96.23% |
F1295A-H3K27me3-Rep2 | 25,093,312 | 24,112,495 | 96.09% |
F1295A-IgG | 33,533,122 | 32,048,165 | 95.57% |
F1295A-H3K4me3-1 | 34,248,564 | 32,386,835 | 94.56% |
TKO-H3K27me3-Rep1 | 30,595,380 | 29,187,050 | 95.40% |
TKO-H3K27me3-Rep2 | 27,448,019 | 26,271,898 | 95.72% |
TKO-IgG | 23,305,865 | 21,718,413 | 93.19% |
TKO-H3K4me3-1 | 15,375,986 | 14,646,303 | 95.25% |
Table 4. Number of reads in each sample before and after QC.
Condition | Reads before QC | Reads after QC | % Retained |
WT-H3K27me3 | 57,457,562 | 55,008,908 | 95.74% |
WT-H3K4me3 | 30,371,791 | 28,257,095 | 93.04% |
D1240A-H3K27me3 | 58,321,084 | 55,917,423 | 95.88% |
D1240A-H3K4me3 | 34,366,992 | 32,259,322 | 93.87% |
F1295A-H3K27me3 | 62,357,154 | 59,971,899 | 96.17% |
F1295A-H3K4me3 | 34,248,564 | 32,386,835 | 94.56% |
TKO-H3K27me3 | 58,043,399 | 55,458,948 | 95.55% |
TKO-H3K4me3 | 15,375,986 | 14,646,303 | 95.25% |
Table 5. Number of reads in each condition before and after QC.
|
3. Mapping to genome
The input sequences were aligned to the genome using Bowtie 2.4.5. The following table reports the number of aligned reads for each
sample. The WIG files can be uploaded to the UCSC
Genome Browser as custom tracks.
Sample | Total reads | Aligned reads | Concordant alignment rate | Bowtie2 report |
WT-H3K27me3-Rep1 | 27,657,729 | 19,826,525 | 71.69% | bam.bowtie/WT-H3K27me3-Rep1.bt2stats.html |
WT-H3K27me3-Rep2 | 27,351,179 | 19,898,837 | 72.75% | bam.bowtie/WT-H3K27me3-Rep2.bt2stats.html |
WT-IgG | 20,663,896 | 9,528,980 | 46.11% | bam.bowtie/WT-IgG.bt2stats.html |
WT-H3K4me3-1 | 28,257,095 | 21,207,032 | 75.05% | bam.bowtie/WT-H3K4me3-1.bt2stats.html |
D1240A-H3K27me3-Rep1 | 32,979,741 | 24,089,897 | 73.04% | bam.bowtie/D1240A-H3K27me3-Rep1.bt2stats.html |
D1240A-H3K27me3-Rep2 | 22,937,682 | 16,851,350 | 73.47% | bam.bowtie/D1240A-H3K27me3-Rep2.bt2stats.html |
D1240A-IgG | 32,117,807 | 17,942,991 | 55.87% | bam.bowtie/D1240A-IgG.bt2stats.html |
D1240A-H3K4me3-1 | 32,259,322 | 22,881,499 | 70.93% | bam.bowtie/D1240A-H3K4me3-1.bt2stats.html |
F1295A-H3K27me3-Rep1 | 35,859,404 | 27,977,235 | 78.02% | bam.bowtie/F1295A-H3K27me3-Rep1.bt2stats.html |
F1295A-H3K27me3-Rep2 | 24,112,495 | 17,727,506 | 73.52% | bam.bowtie/F1295A-H3K27me3-Rep2.bt2stats.html |
F1295A-IgG | 32,048,165 | 20,165,942 | 62.92% | bam.bowtie/F1295A-IgG.bt2stats.html |
F1295A-H3K4me3-1 | 32,386,835 | 25,350,562 | 78.27% | bam.bowtie/F1295A-H3K4me3-1.bt2stats.html |
TKO-H3K27me3-Rep1 | 29,187,050 | 21,311,437 | 73.02% | bam.bowtie/TKO-H3K27me3-Rep1.bt2stats.html |
TKO-H3K27me3-Rep2 | 26,271,898 | 19,191,464 | 73.05% | bam.bowtie/TKO-H3K27me3-Rep2.bt2stats.html |
TKO-IgG | 21,718,413 | 9,788,834 | 45.07% | bam.bowtie/TKO-IgG.bt2stats.html |
TKO-H3K4me3-1 | 14,646,303 | 9,491,694 | 64.81% | bam.bowtie/TKO-H3K4me3-1.bt2stats.html |
Table 6. Number of alignments to genome.
|
4. Genome coverage
The following table reports the overall and effective genome coverage in each sample. The Total nt column reports
the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number
divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the
Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective
genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of
the genome.
Name | Total nt | Coverage | Effective bp | Effective Perc | Eff Coverage |
WT-H3K27me3-Rep1 | 4,163,719,552 | 1.35 | 180,837,235 | 5.90% | 23.02 |
WT-H3K27me3-Rep2 | 4,118,097,446 | 1.33 | 192,419,670 | 6.20% | 21.40 |
WT-IgG | 459,194,788 | 0.15 | 44,167,769 | 1.40% | 10.40 |
WT-H3K4me3-1 | 3,056,258,731 | 0.99 | 42,228,688 | 1.40% | 72.37 |
D1240A-H3K27me3-Rep1 | 4,955,173,927 | 1.61 | 258,672,056 | 8.40% | 19.16 |
D1240A-H3K27me3-Rep2 | 3,305,134,621 | 1.07 | 193,159,495 | 6.30% | 17.11 |
D1240A-IgG | 818,929,244 | 0.27 | 89,191,993 | 2.90% | 9.18 |
D1240A-H3K4me3-1 | 2,874,183,511 | 0.93 | 49,623,311 | 1.60% | 57.92 |
F1295A-H3K27me3-Rep1 | 5,282,552,793 | 1.71 | 289,695,843 | 9.40% | 18.23 |
F1295A-H3K27me3-Rep2 | 3,088,047,820 | 1.00 | 201,242,869 | 6.50% | 15.34 |
F1295A-IgG | 898,129,191 | 0.29 | 96,408,689 | 3.10% | 9.32 |
F1295A-H3K4me3-1 | 3,756,192,513 | 1.22 | 51,793,863 | 1.70% | 72.52 |
TKO-H3K27me3-Rep1 | 4,253,547,208 | 1.38 | 242,032,205 | 7.80% | 17.57 |
TKO-H3K27me3-Rep2 | 3,661,733,903 | 1.19 | 223,610,350 | 7.20% | 16.38 |
TKO-IgG | 400,476,845 | 0.13 | 40,576,192 | 1.30% | 9.87 |
TKO-H3K4me3-1 | 1,627,896,672 | 0.53 | 28,544,793 | 0.90% | 57.03 |
Table 7. Genome coverage by sample.
The following table reports the overall and effective genome coverage in each condition.
Name | Total nt | Coverage | Effective bp | Effective Perc | Eff Coverage |
WT-H3K27me3 | 9,041,109,112 | 2.93 | 305,965,930 | 9.90% | 29.55 |
WT-H3K4me3 | 3,056,258,731 | 0.99 | 42,228,688 | 1.40% | 72.37 |
D1240A-H3K27me3 | 9,268,999,366 | 3.00 | 387,459,458 | 12.60% | 23.92 |
D1240A-H3K4me3 | 2,874,183,511 | 0.93 | 49,623,311 | 1.60% | 57.92 |
F1295A-H3K27me3 | 9,563,155,396 | 3.10 | 436,935,261 | 14.20% | 21.89 |
F1295A-H3K4me3 | 3,756,192,513 | 1.22 | 51,793,863 | 1.70% | 72.52 |
TKO-H3K27me3 | 9,007,093,616 | 2.92 | 402,937,010 | 13.10% | 22.35 |
TKO-H3K4me3 | 1,627,896,672 | 0.53 | 28,544,793 | 0.90% | 57.03 |
Table 8. Genome coverage by condition
|
5. Peak detection
Peak detection was performed using SEACR with the following options: mode=relaxed, normalization=True.
The following table shows the number of peaks found for each condition.
Table 9. Peaks detected by SEACR and their classification in genomic regions.
The following histogram shows the distribution of peak locations in the different conditions.
|
6. Fraction of Reads in Peaks
The Fraction of Reads in Peaks (FRIP) is the fraction of reads that fall in regions called as peaks, out of all aligned peaks.
Condition | Reads | Reads in peaks | FRIP |
WT-H3K27me3 | 98,508,895 | 45,808,755 | 46.50% |
WT-H3K4me3 | 49,504,921 | 33,021,110 | 66.70% |
D1240A-H3K27me3 | 102,137,183 | 28,922,745 | 28.32% |
D1240A-H3K4me3 | 54,674,630 | 32,223,153 | 58.94% |
F1295A-H3K27me3 | 111,485,180 | 29,479,390 | 26.44% |
F1295A-H3K4me3 | 58,245,844 | 37,630,536 | 64.61% |
TKO-H3K27me3 | 100,081,195 | 38,398,511 | 38.37% |
TKO-H3K4me3 | 23,100,943 | 15,707,557 | 68.00% |
Table 10. Fraction of Reads in Peaks
|
7. Differential peak detection
The following table reports peaks that are increased, decreased, or unchanged in each contrast. Note that these results are simply based on the fold change of the peak sizes in the two conditions,
with no statistical significance information, using a Log2(FC) threshold of 1.0. More accurate differential analysis can be performed with DASA.
Show values | percentages Table 11. Differential peak analysis results.
|
8. MultiQC report
MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report
on this analysis (generated using MultiQC version 1.12) is available here:
MultiQC report
|
9. UCSC hub
UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://lichtlab.cancer.ufl.edu/reports/MLC//PHD_CR_20230612/PHD_CR_20230612/hub.txt and paste it into the "My Hubs" form in this page.
WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://lichtlab.cancer.ufl.edu/reports/MLC//PHD_CR_20230612/PHD_CR_20230612/hub.json.
Completed: 6-14-2023@12:48 |
© 2023, A. Riva, University of Florida. |