DiBiG
ICBR Bioinformatics	Powered by Actor, v1.0

ChIPseq - Alignment and peak finding

Title: Jianping-CS-NS2161-me2-nodex
Project: (none)
Started on: 10/27/2023 12:22:05
Hostname: login7.ufhpc
Run directory: /orange/licht/runs/NSD2-E1099K-Project/NS2161/Jianping-CS-NS2161-me2-nodex
Configuration Jianping-CS-NS2161-me2-nodex.conf

Table of contents:

Input data
Trimming and quality control
Mapping to genome
Genome coverage
Peak detection (MACS2)
Fraction of Reads in Peaks
MultiQC report
UCSC hub

1. Input data
The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).

Category	Data
*Summary of input data*
Reference genome:	hg38
Experimental conditions:	RC-W-C-K36m2-2, RC-M-C-K36m2-2
Contrasts:	RC-M-C-K36m2-2 vs. RC-W-C-K36m2-2
Number of samples	2
*Sequencing data data*
Total number of reads:	252,368,799
Average reads per sample:	126,184,399

Table 1. Summary of input data

Condition	Sample	Number of reads	% Reads
RC-W-C-K36m2-2	RC-W-C-K36m2-2_S	126,721,500	50.21%
RC-M-C-K36m2-2	RC-M-C-K36m2-2_S	125,647,299	49.79%

Table 2. Number of reads in each sample.

2. Trimming and quality control
The input sequences were trimmed using trimmomatic. Quality control was performed before and after trimming using FastQC. The following table provides links to the quality control reports before and after trimming, as well as the number of reads in the trimmed files.

Sample	Readset	Reads before trim	QC before trim	Reads after trim	QC after trim	% Retained
RC-W-C-K36m2-2_S	RC-W-C-K36m2-2_S_r1	126,721,500	RC-W-C-K36m2-2_S125_L004_R1_001 RC-W-C-K36m2-2_S125_L004_R2_001	122,826,803	RC-W-C-K36m2-2_S125_L004_R1_001.trim.paired RC-W-C-K36m2-2_S125_L004_R2_001.trim.paired	96.93%
RC-M-C-K36m2-2_S	RC-M-C-K36m2-2_S_r1	125,647,299	RC-M-C-K36m2-2_S123_L004_R1_001 RC-M-C-K36m2-2_S123_L004_R2_001	121,863,599	RC-M-C-K36m2-2_S123_L004_R1_001.trim.paired RC-M-C-K36m2-2_S123_L004_R2_001.trim.paired	96.99%

Table 3. Number of reads in input files and links to QC reports.

The following two tables report the number of reads before and after QC in each sample and in each condition.

Sample	Reads before QC	Reads after QC	% Retained
RC-W-C-K36m2-2_S	126,721,500	122,826,803	96.93%
RC-M-C-K36m2-2_S	125,647,299	121,863,599	96.99%

Table 4. Number of reads in each sample before and after QC.

Condition	Reads before QC	Reads after QC	% Retained
RC-W-C-K36m2-2	126,721,500	122,826,803	96.93%
RC-M-C-K36m2-2	125,647,299	121,863,599	96.99%

Table 5. Number of reads in each condition before and after QC.

3. Mapping to genome
The input sequences were aligned to the hg38 genome using Bowtie 2.4.5. The following table reports the number of aligned reads for each sample. The WIG files can be uploaded to the UCSC Genome Browser as custom tracks.

Sample	Total reads	Aligned reads	Concordant alignment rate	Bowtie2 report
RC-W-C-K36m2-2_S	122,826,803	101,744,691	82.84%	bam.bowtie/RC-W-C-K36m2-2_S.bt2stats.html
RC-M-C-K36m2-2_S	121,863,599	100,348,501	82.34%	bam.bowtie/RC-M-C-K36m2-2_S.bt2stats.html

Table 6. Number of alignments to genome.

4. Genome coverage
The following table reports the overall and effective genome coverage in each sample. The Total nt column reports the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of the genome.

Name	Total nt	Coverage	Effective bp	Effective Perc	Eff Coverage
RC-W-C-K36m2-2_S	3,067,105,562	0.99	357,186,936	11.60%	8.59
RC-M-C-K36m2-2_S	3,487,325,441	1.13	422,366,656	13.70%	8.26

Table 7. Genome coverage by sample.

The following table reports the overall and effective genome coverage in each condition.

Name	Total nt	Coverage	Effective bp	Effective Perc	Eff Coverage
RC-W-C-K36m2-2	0	0.00	0	0.00%	0.00
RC-M-C-K36m2-2	3,487,325,441	1.13	422,366,656	13.70%	8.26

Table 8. Genome coverage by condition

File: Jianping-CS-NS2161-me2-nodex.sample.cov.xlsx
Size: 9.37 kB
Description: Per-chromosome coverage data, by sample.

File: Jianping-CS-NS2161-me2-nodex.cond.cov.xlsx
Size: 7.66 kB
Description: Per-chromosome coverage data, by condition.

5. Peak detection (MACS2)
Peak detection was performed using MACS version 2.2.7.1 with the following options: broad=N, model=Y, paired=Y, qvalue=0.05. The following table shows the number of peaks found for each condition, and their classification. Click on the link in the Peaks column to download the list of peaks in tab-delimited format.

Condition	Total Peaks	Peaks	Actions
RC-W-C-K36m2-2	63,541	RC-W-C-K36m2-2.macs/RC-W-C-K36m2-2_peaks.csv	Create RegionSet
RC-M-C-K36m2-2	37,060	RC-M-C-K36m2-2.macs/RC-M-C-K36m2-2_peaks.csv	Create RegionSet

Table 9. Classification of peaks in genome regions

The following table provides links to the Pileup, narrowPeaks, and Summits files for each condition. All files are in bedGraph format.

Condition	Num peaks	Pileup	Peaks	Summits
RC-W-C-K36m2-2	63,541	RC-W-C-K36m2-2.macs/RC-W-C-K36m2-2.bedGraph	RC-W-C-K36m2-2.macs/RC-W-C-K36m2-2.npeaks.bedGraph	RC-W-C-K36m2-2.macs/RC-W-C-K36m2-2.summits.bedGraph
RC-M-C-K36m2-2	37,060	RC-M-C-K36m2-2.macs/RC-M-C-K36m2-2.bedGraph	RC-M-C-K36m2-2.macs/RC-M-C-K36m2-2.npeaks.bedGraph	RC-M-C-K36m2-2.macs/RC-M-C-K36m2-2.summits.bedGraph

Table 10. Results of peak detection with MACS.

The following histogram shows the distribution of peak locations in the different conditions.

6. Fraction of Reads in Peaks
The Fraction of Reads in Peaks (FRIP) is the fraction of reads that fall in regions called as peaks, out of all aligned peaks.

Condition	Reads	Reads in peaks	FRIP
RC-W-C-K36m2-2	243,416,733	11,861,976	4.87%
RC-M-C-K36m2-2	241,565,445	7,535,506	3.12%

Table 11. Fraction of Reads in Peaks

7. MultiQC report
MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report on this analysis (generated using MultiQC version 1.12) is available here:

MultiQC report

8. UCSC hub

UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://lichtlab.cancer.ufl.edu/reports/NSD2//Jianping-CS-NS2161-me2-nodex/NS2161-me2-nodex/hub.txt and paste it into the "My Hubs" form in this page.

WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://lichtlab.cancer.ufl.edu/reports/NSD2//Jianping-CS-NS2161-me2-nodex/NS2161-me2-nodex/hub.json.

Completed: 10-27-2023@12:22