| Title: GE7176 Project: (none)
 Started on: 1/4/2024 15:02:26
 Hostname: login7.ufhpc
 Run directory: /blue/licht/runs/Evans-MDS/GE7176/GE7176
 Configuration GE7176.conf
 | 
        
        | Table of contents: 
Input dataTrimming and quality controlAlignment to transcriptomeGenome coverageExpression analysis - quantificationDifferential expression - protein-coding genesDifferential expression - all genesDifferential expression - isoform levelDifferential expression - combined filesMultiQC reportUCSC hubMethods summary | 
| 1. Input data The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).
 
 
 
  
    Table 1. Summary of input data| Category | Data |  | Summary of input data |  | Reference genome: | mm10 |  | Experimental conditions: | JX-A, JX-B |  | Contrasts: | JX-A vs. JX-B |  | Number of samples | 4 |  | Sequencing data data |  | Total number of reads: | 483,574,308 |  | Average reads per sample: | 120,893,577 |  
 
 
 
 
  
    Table 2. Number of reads in each sample.| Condition | Sample | Number of reads | % Reads |  | JX-A | SJX-1 | 115,017,526 | 23.78% |  | SJX-2 | 129,296,388 | 26.74% |  | JX-B | SJX-3 | 125,986,759 | 26.05% |  | SJX-4 | 113,273,635 | 23.42% |  
 
 | 
| 2. Trimming and quality control The input sequences were trimmed using trimmomatic. Quality control was performed before and after trimming using FastQC. The following table provides links to the 
quality control reports before and after trimming, as well as the number of reads in the trimmed files.
 
 
 
  
    Table 3. Number of reads in input files and links to QC reports.| Sample | Readset | Reads before trim | QC before trim | Reads after trim | QC after trim | % Retained |  | SJX-1 | SJX-1_r1 | 9,263,928 | JX-1_S1_L003_R1_001 JX-1_S1_L003_R2_001
 | 8,823,034 | JX-1_S1_L003_R1_001.trim.paired JX-1_S1_L003_R2_001.trim.paired
 | 95.24% |  | SJX-1_r2 | 105,753,598 | JX-1_S1_L004_R1_001 JX-1_S1_L004_R2_001
 | 103,915,934 | JX-1_S1_L004_R1_001.trim.paired JX-1_S1_L004_R2_001.trim.paired
 | 98.26% |  | SJX-2 | SJX-2_r1 | 11,790,601 | JX-2_S2_L003_R1_001 JX-2_S2_L003_R2_001
 | 11,096,417 | JX-2_S2_L003_R1_001.trim.paired JX-2_S2_L003_R2_001.trim.paired
 | 94.11% |  | SJX-2_r2 | 117,505,787 | JX-2_S2_L004_R1_001 JX-2_S2_L004_R2_001
 | 115,385,395 | JX-2_S2_L004_R1_001.trim.paired JX-2_S2_L004_R2_001.trim.paired
 | 98.20% |  | SJX-3 | SJX-3_r1 | 12,808,900 | JX-3_S3_L003_R1_001 JX-3_S3_L003_R2_001
 | 12,120,179 | JX-3_S3_L003_R1_001.trim.paired JX-3_S3_L003_R2_001.trim.paired
 | 94.62% |  | SJX-3_r2 | 113,177,859 | JX-3_S3_L004_R1_001 JX-3_S3_L004_R2_001
 | 111,351,891 | JX-3_S3_L004_R1_001.trim.paired JX-3_S3_L004_R2_001.trim.paired
 | 98.39% |  | SJX-4 | SJX-4_r1 | 10,057,238 | JX-4_S4_L003_R1_001 JX-4_S4_L003_R2_001
 | 9,573,639 | JX-4_S4_L003_R1_001.trim.paired JX-4_S4_L003_R2_001.trim.paired
 | 95.19% |  | SJX-4_r2 | 103,216,397 | JX-4_S4_L004_R1_001 JX-4_S4_L004_R2_001
 | 101,564,605 | JX-4_S4_L004_R1_001.trim.paired JX-4_S4_L004_R2_001.trim.paired
 | 98.40% |  
 The following two tables report the number of reads before and after QC in each sample and in each condition.
 
 
 
  
    Table 4. Number of reads in each sample before and after QC.| Sample | Reads before QC | Reads after QC | % Retained |  | SJX-1 | 115,017,526 | 112,738,968 | 98.02% |  | SJX-2 | 129,296,388 | 126,481,812 | 97.82% |  | SJX-3 | 125,986,759 | 123,472,070 | 98.00% |  | SJX-4 | 113,273,635 | 111,138,244 | 98.11% |  
 
 
 
 
  
    Table 5. Number of reads in each condition before and after QC.| Condition | Reads before QC | Reads after QC | % Retained |  | JX-A | 244,313,914 | 239,220,780 | 97.92% |  | JX-B | 239,260,394 | 234,610,314 | 98.06% |  
 
 | 
| 3. Alignment to transcriptome The input sequences were aligned to the mm10 transcriptome using 2.7.9a. The following table reports the number of
alignments to the genome and the transcriptome for each sample. Please note that the number of alignments will in general be higher than the number of
reads because the same read may align to multiple isoforms of the same gene. The WIG files can be uploaded to the UCSC
Genome Browser as custom tracks.
 
 
 
  
    Table 6. Number of alignments to genome and transcriptome.| Sample | Input reads | Genome alignments | Genome alignment rate | Transcriptome alignments | Transcriptome alignment rate | Alignment report |  | SJX-1 | 112,738,968 | 223,114,845 | 1.98 | 91,253,633 | 80.94% | SJX-1.star/Log.final.out |  | SJX-2 | 126,481,812 | 263,012,014 | 2.08 | 105,978,371 | 83.79% | SJX-2.star/Log.final.out |  | SJX-3 | 123,472,070 | 256,632,993 | 2.08 | 104,771,098 | 84.85% | SJX-3.star/Log.final.out |  | SJX-4 | 111,138,244 | 230,897,532 | 2.08 | 95,440,754 | 85.88% | SJX-4.star/Log.final.out |  
 
 | 
| 4. Genome coverage The following table reports the overall and effective genome coverage in each sample. The Total nt column reports
the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number 
divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the
Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective
genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of
the genome.
 
 
 
  
    Table 7. Genome coverage by sample.| Name | Total nt | Coverage | Effective bp | Effective Perc | Eff Coverage |  | SJX-1 | 186,052,225,364 | 68.30 | 602,834,400 | 22.10% | 308.63 |  | SJX-2 | 233,881,816,666 | 85.85 | 616,194,396 | 22.60% | 379.56 |  | SJX-3 | 233,802,702,465 | 85.83 | 588,099,180 | 21.60% | 397.56 |  | SJX-4 | 219,936,020,087 | 80.73 | 608,625,506 | 22.30% | 361.37 |  
 The following table reports the overall and effective genome coverage in each condition.
 
 
 
  
    Table 8. Genome coverage by condition| Name | Total nt | Coverage | Effective bp | Effective Perc | Eff Coverage |  | JX-A | 404,179,690,828 | 148.33 | 730,426,973 | 26.80% | 553.35 |  | JX-B | 434,878,602,904 | 159.60 | 710,871,098 | 26.10% | 611.75 |  
 
 
 
 | 
| 5. Expression analysis - quantification Gene and transcript expression values were quantified using RSEM v1.3.1. The following files contain the raw FPKM values for all
genes/transcripts in all samples. NOTE: these values are not normalized yet, please apply the appropriate normalization before using them in
analysis.
 
 
 The following scatterplots show the level of similarity between replicates of the same condition.
 
 
 Principal Component Analysis on raw (un-normalized) expression data. Click on the thumbnail to display the full-size image.
 
 The following image displays the Multi-Dimensional Scaling (MDS) plot for the raw (un-normalized) expression data. Click on the thumbnail to display the full-size image.
 | 
| 6. Differential expression - protein-coding genes Differential gene expression was analyzed using DESeq2.
The following table reports the number of differentially expressed 
genes in each contrast with abs(log2(FC)) >= 1.0 and FDR-corrected P-value <= 0.05.
The files under the Table heading contain the log2(FC) and P-value of all significant genes, while the files under the Expressions heading
contain normalized expression values for the significant genes in all replicates of the two conditions being compared.
The lists of differentially expressed genes for all contrasts can also be downloaded as a single Excel file using the link below.
 
 Table 9. Results of gene-level differential expression analysis.
 
 
 
 
 
 Principal Component Analysis on normalized expression data. Click on the thumbnail to display the full-size image.
 
 The following image displays the Multi-Dimensional Scaling (MDS) plot for the normalized expression data. 
In this plot, relative distances between samples reflect the similarity of their gene expression profiles. Ideally, replicates of the same condition should be close together, and well separated from other conditions.
 Volcano plots for all contrasts. Use the menu to select a contrast.
 
 
 | 
| 7. Differential expression - all genes The following table reports results from the same differential analysis as above, but includes all biotypes instead of coding genes only.
 
 Table 10. Results of gene-level differential expression analysis (all biotypes).
 
 
 
 
 
 | 
| 8. Differential expression - isoform level The following table reports the number of differentially expressed 
isoforms in each contrast with abs(log2(FC)) >= 1.0 and FDR-corrected P-value <= 0.05.
The lists of differentially expressed isoforms for all contrasts can also be downloaded as a single Excel file using the link below.
 
 Table 11. Results of isoform-level differential expression analysis.
 
 
 
 
 | 
| 9. Differential expression - combined files The following file contains merged differential expression data. The first sheet contains fold changes for all genes that were
found to be differentially expressed in at least one contrast. The second and third sheets contain the same information for coding genes only, and all
transcripts.
 
 | 
| 10. MultiQC report MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report 
on this analysis (generated using MultiQC version 1.12) is available here:
 
 MultiQC report
 | 
| 11. UCSC hub 
 UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://lichtlab.cancer.ufl.edu/reports/MDS//GE7176/hub/hub.txt and paste it into the "My Hubs" form in this page. WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://lichtlab.cancer.ufl.edu/reports/MDS//GE7176/hub/hub.json. | 
| 12. Methods summary 
 Short reads were trimmed using trimmomatic (v 0.36)  [1], and QC on the original and trimmed reads was performed using FastQC (v 0.11.4)  [2] and MultiQC  [3]. The reads were aligned to the transcriptome using STAR version 2.7.9a  [4]. Transcript abundance was quantified using RSEM (RSEM v1.3.1)  [5]. Differential expression analysis was performed using DESeq2  [6], with an FDR-corrected P-value threshold of 0.05. The output files were further filtered to extract transcripts showing a 2.0-fold change in either direction.  Results were reported for protein-coding genes only, and for all transcript types. References
 
Bolger, A. M., Lohse, M., and Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Philip Ewels, Mans Magnusson, Sverker Lundin and Max Kaller (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics | doi: 10.1093/bioinformatics/btw354 | PubMed: 27312411Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1):15-21 | doi: 10.1093/bioinformatics/bts635Li B and Dewey CN (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323 | doi: 10.1186/1471-2105-12-323Love MI, huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15,550 (2014). | doi: 10.1186/s13059-014-0550-8 
 Completed: 1-4-2024@15:02
 | 
| © 2024, A. Riva, University of Florida. |