![]() | DiBiG |
ICBR Bioinformatics | Powered by Actor, v1.0 |
---|---|
Title: GE7218 Project: (none) Started on: 1/5/2024 11:56:54 Hostname: login7.ufhpc Run directory: /blue/licht/runs/H2B-E77K-Project/GE7218/GE7218 Configuration GE7218.conf |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table of contents:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Input data The following table summarizes the samples, conditions, and contrasts in this analysis. A readset is either a single fastq file or a pair of fastq files (for paired-end sequencing).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. Trimming and quality control The input sequences were trimmed using fastp (version 0.22.0). The following table provides links to the quality control reports after trimming, as well as the number of reads in the trimmed files.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. Alignment to transcriptome The input sequences were aligned to the hg38 transcriptome using 2.7.9a. The following table reports the number of alignments to the genome and the transcriptome for each sample. Please note that the number of alignments will in general be higher than the number of reads because the same read may align to multiple isoforms of the same gene. The WIG files can be uploaded to the UCSC Genome Browser as custom tracks.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4. Genome coverage The following table reports the overall and effective genome coverage in each sample. The Total nt column reports the total number of nucleotides sequenced, i.e. the number of aligned reads times the length of each read. Coverage is this number divided by the size of the genome. Effective bp reports the number of bases in the genome having coverage greater than 5, and the Effective Perc column shows what percentage this is of the genome size. Note that, especially in the case of RNA-seq, the effective genome size may be much smaller than the full size. Eff Coverage is the average coverage over the effectively covered fraction of the genome.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5. Expression analysis - quantification Gene and transcript expression values were quantified using RSEM v1.3.1. The following files contain the raw FPKM values for all genes/transcripts in all samples. NOTE: these values are not normalized yet, please apply the appropriate normalization before using them in analysis.
The following scatterplots show the level of similarity between replicates of the same condition.
Principal Component Analysis on raw (un-normalized) expression data. Click on the thumbnail to display the full-size image.
The following image displays the Multi-Dimensional Scaling (MDS) plot for the raw (un-normalized) expression data. Click on the thumbnail to display the full-size image. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6. Differential expression - protein-coding genes Differential gene expression was analyzed using DESeq2. The following table reports the number of differentially expressed genes in each contrast with abs(log
Principal Component Analysis on normalized expression data. Click on the thumbnail to display the full-size image.
The following image displays the Multi-Dimensional Scaling (MDS) plot for the normalized expression data. In this plot, relative distances between samples reflect the similarity of their gene expression profiles. Ideally, replicates of the same condition should be close together, and well separated from other conditions. Volcano plots for all contrasts. Use the menu to select a contrast.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7. Differential expression - all genes The following table reports results from the same differential analysis as above, but includes all biotypes instead of coding genes only.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8. Differential expression - isoform level The following table reports the number of differentially expressed isoforms in each contrast with abs(log
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9. Differential expression - combined files The following file contains merged differential expression data. The first sheet contains fold changes for all genes that were found to be differentially expressed in at least one contrast. The second and third sheets contain the same information for coding genes only, and all transcripts.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10. Alternative splicing analysis Alternative splicing analysis was performed using rMATS version v4.1.0. The following table reports the number of events in each class for each contrast. The link in the last column allows you to download an Excel file containing full results.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11. MultiQC report MultiQC is a general Quality Control tool for a large number of bioinformatics pipelines. The report on this analysis (generated using MultiQC version 1.12) is available here: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12. UCSC hub UCSC Genome Browser: use the previous link to display the data tracks automatically, or copy the the URL https://lichtlab.cancer.ufl.edu/reports/H2B//GE7218/hub/hub.txt and paste it into the "My Hubs" form in this page. WashU EpiGenome Browser: use the previous link to display the data tracks automatically, or copy the following URL into the "Datahub by URL Link" field: https://lichtlab.cancer.ufl.edu/reports/H2B//GE7218/hub/hub.json. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13. Methods summary Trimming and QC on short reads were performed by fastp (v 0.22.0) [1]. The reads were aligned to the transcriptome using STAR version 2.7.9a [2]. Transcript abundance was quantified using RSEM (RSEM v1.3.1) [3]. Differential expression analysis was performed using DESeq2 [4], with an FDR-corrected P-value threshold of 0.05. The output files were further filtered to extract transcripts showing a 2.0-fold change in either direction. Results were reported for protein-coding genes only, and for all transcript types. Alternative splicing analysis was performed using rMATS version v4.1.0 [5]. References
Completed: 1-5-2024@12:03 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2024, A. Riva, University of Florida. |