Initialization

Seurat

We analyzed sample E76-KP from directory E76-KP/outs/filtered_feature_bc_matrix/. Input dataset:

data = rs_load10Xrun(params$data_dir, project=params$sample)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
data
## An object of class Seurat 
## 19555 features across 12096 samples within 1 assay 
## Active assay: RNA (19555 features, 0 variable features)

Pre-processing

The following violin plot shows the distribution of the number of genes, counts, and percentage of mitochondrial reads in all cells.

We filtered the dataset to remove cells with fewer than 500 or more than 20000 genes, and with a proportion of mitochondrial genes higher than 10 percent. The resulting dataset contains 11831 cells (97.8% of the initial number).

The following plots display the relationship between counts and percent mitochondrial reads, and number of genes.

## Warning: CombinePlots is being deprecated. Plots should now be combined using
## the patchwork system.

Normalization and feature selection

After normalization, we identified the 12 most variable genes. They are:

##  [1] "TP53I3" "TAGLN"  "MDM2"   "DHRS2"  "CCNB1"  "AURKA"  "NR0B1"  "LCE1C" 
##  [9] "PVRL4"  "UBE2C"  "IFI27"  "HMMR"

The following chart plots the standardized variance versus the average expression. Outliers represents features with high variability (the top 2000 are in red). The 12 most variable features are labeled.

## When using repel, set xnudge and ynudge to 0 for optimal results

PCA and clustering

The following plot displays the first two dimensions of the Principal Component Analysis of this dataset.

## Centering and scaling data matrix
## PC_ 1 
## Positive:  RPL10, MT1X, EEF1A1, RPL7, S100A6, DDIT4, RPL7A, RPL3, RPL23, RPS4X 
##     MT2A, MT1E, GNB2L1, RPS3, KRT18, RPL4, DUSP23, RPS19, RPSA, KRT7 
##     KRT8, IGFBP2, TKT, DUT, JUNB, ZFP36L2, GADD45B, TPM2, MCM6, CDCA7 
## Negative:  MKI67, UBE2C, TUBA1B, TOP2A, TPX2, UBE2S, HIST1H4C, PRC1, TUBB4B, HIST1H1B 
##     NUSAP1, HMGB2, CDKN3, CENPF, KIF23, CENPE, HMMR, CDCA3, NDC80, PTTG1 
##     DYNLL1, ANLN, ASPM, CKS2, MAP1B, TUBA1C, HMGN2, HIST1H1A, DEPDC1, KIF11 
## PC_ 2 
## Positive:  CLSPN, KIAA0101, DUT, MYBL2, HELLS, TK1, CENPM, FAM111B, TYMS, DHFR 
##     RRM2, E2F1, DNMT1, RNASEH2A, ATAD2, ACAT2, TMEM106C, PSMC3IP, CCNE2, CENPU 
##     ZWINT, CDC45, CDC6, MCM4, FAM111A, TCF19, RRM1, MCM3, RBBP8, DTL 
## Negative:  CCNB1, AURKA, PLK1, CDC20, HMMR, PRR11, ARL6IP1, CKS2, KIF20A, CENPA 
##     PTTG1, CENPE, PIF1, CCNB2, DEPDC1, KPNA2, KIF18A, DLGAP5, LGALS1, FAM83D 
##     CENPF, PSRC1, KIF14, CDCA3, RPL7A, HSP90B1, KNSTRN, NEK2, RPL23, LDHA 
## PC_ 3 
## Positive:  COL8A1, KRT7, ANXA2, THBS1, OGFRL1, PPP1R14B, TMSB10, IGFBP3, EFEMP1, SERPINE1 
##     TMSB4X, FHL2, ENAH, CTGF, FLNA, S100A10, PAWR, IL18, TPM1, KRT17 
##     ACTB, ANKRD1, ALCAM, LGALS1, CAV1, ANXA3, TRAM2, TAGLN2, TAGLN, ACTN1 
## Negative:  FOS, GDF15, DDIT3, HMGN2, IER2, FDXR, RPS27L, HIST1H2AG, HIST1H1A, AREG 
##     HIST2H2AC, BTG1, BTG2, RP3-510D11.2, BAX, CD82, HIST1H2AL, PVRL4, HLA-B, UBE2T 
##     GCHFR, MDM2, GAMT, BTG3, HIST1H3G, HIST1H2AH, ARG2, HIST2H2BF, CDKN2C, HSPA1A 
## PC_ 4 
## Positive:  EIF1, GAPDH, TRIB3, TMSB10, RNH1, PLP2, MYL6, MRPL54, TUBA1B, PFDN2 
##     SLC3A2, TNFRSF12A, HN1, KRT10, YWHAB, RPL22L1, ANXA3, RAB32, H2AFZ, DYNLL1 
##     UBE2S, RNASEH2A, H3F3B, HMGN2, UBB, HIST1H4C, NIFK, CDKN3, HMGB2, YWHAH 
## Negative:  PVRL4, MDM2, TP53I3, CDKN1A, BTG2, SULF2, CMBL, FDXR, DRAXIN, UNC5B-AS1 
##     NEAT1, KIAA1324, SLC52A1, ZMAT3, PHLDA3, TCEA3, RP3-510D11.2, CYSRT1, HES2, APOBEC3C 
##     CD82, INPP5D, GLS2, MAST4, PIDD1, ITIH5, CES2, WNT4, SERPINB5, CYFIP2 
## PC_ 5 
## Positive:  MALAT1, MT-ND6, MT-ND2, HNRNPU, NCL, NEAT1, TAF15, EGR1, AP000769.1, SYNE2 
##     PHF3, RIF1, RAD21, MT-ND4, PEG10, HNRNPA3, RPL23, FOS, SPTBN1, TGFBR2 
##     PIK3R1, ATRX, BRCA2, BCLAF1, MACF1, LINC00657, MSH6, CLTC, PCM1, CHD4 
## Negative:  CLIC1, PHPT1, TNFRSF12A, RHOC, IGFBP7, PVRL4, GAPDH, TP53I3, RPS19, FHL2 
##     S100A11, S100A10, RAB32, GADD45A, KRT17, PFDN2, FADS3, TUBB4B, FDXR, ZFAS1 
##     TAGLN2, RPS3, RPS27L, EIF1, CD151, HRAS, CSRP1, RNH1, S100A16, RHOD

Clustering results displayed using the UMAP method, with 10 dimensions and a resolution of 0.5.

## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 11831
## Number of edges: 370888
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8355
## Number of communities: 8
## Elapsed time: 4 seconds
## 
## 
## Table: Number of cells in each cluster
## 
## |Var1 | Freq|
## |:----|----:|
## |0    | 2674|
## |1    | 2193|
## |2    | 1956|
## |3    | 1768|
## |4    | 1413|
## |5    | 1283|
## |6    |  301|
## |7    |  243|
## 21:41:13 UMAP embedding parameters a = 0.9922 b = 1.112
## 21:41:13 Read 11831 rows and found 10 numeric columns
## 21:41:13 Using Annoy for neighbor search, n_neighbors = 30
## 21:41:13 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 21:41:14 Writing NN index file to temp file /scratch/local/46218579/Rtmpi9Ds84/file5932cf59c09
## 21:41:14 Searching Annoy index using 1 thread, search_k = 3000
## 21:41:20 Annoy recall = 100%
## 21:41:20 Commencing smooth kNN distance calibration using 1 thread
## 21:41:21 Initializing from normalized Laplacian + noise
## 21:41:21 Commencing optimization for 200 epochs, with 469964 positive edges
## 21:41:27 Optimization finished

Clustering results displayed using the t-SNE method:

These results were saved to results saved to E76-KP.rds. This file can be imported into R using the readRDS() function.

Identification of marker genes

## # A tibble: 16 × 7
## # Groups:   cluster [8]
##        p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene      
##        <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>     
##  1 0              0.781 0.997 0.952 0         0       DUT       
##  2 0              0.733 0.997 0.931 0         0       TK1       
##  3 0              1.70  1     0.971 0         1       HIST1H4C  
##  4 0              1.51  0.999 0.905 0         1       H1F0      
##  5 0              1.18  0.979 0.889 0         2       MT1X      
##  6 0              1.05  0.957 0.807 0         2       DDIT4     
##  7 5.59e-202      0.706 0.988 0.921 1.09e-197 3       COL8A1    
##  8 1.80e-153      0.647 0.761 0.499 3.52e-149 3       THBS1     
##  9 0              2.93  0.984 0.378 0         4       CCNB1     
## 10 0              2.43  0.996 0.457 0         4       HMMR      
## 11 0              1.09  1     0.973 0         5       HIST1H4C  
## 12 2.51e-306      0.820 0.954 0.466 4.92e-302 5       HIST1H1B  
## 13 4.61e- 14      1.93  0.382 0.265 9.01e- 10 6       AP000769.1
## 14 4.06e-  9      1.58  0.296 0.591 7.95e-  5 6       SAA1      
## 15 1.55e-155      2.96  0.979 0.477 3.02e-151 7       MDM2      
## 16 3.73e-112      3.15  0.922 0.505 7.29e-108 7       TP53I3

Marker genes for this sample saved to E76-KP.markers.csv. This is a tab-delimited file that can be opened with Excel. The following tables list the top 12 marker genes for each cluster.

Top 12 marker genes for cluster 0
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
DUT 0 0.7808906 0.997 0.952 0 0 DUT
TK1 0 0.7333225 0.997 0.931 0 0 TK1
RPS3A 0 0.2719043 1.000 0.999 0 0 RPS3A
GINS2 0 0.5280579 0.941 0.788 0 0 GINS2
KIAA0101 0 0.4970896 0.996 0.945 0 0 KIAA0101
E2F1 0 0.5005704 0.926 0.741 0 0 E2F1
RPS4X 0 0.3045369 0.999 0.997 0 0 RPS4X
PCNA 0 0.4790178 0.979 0.897 0 0 PCNA
TYMS 0 0.5206032 0.949 0.795 0 0 TYMS
IFITM3 0 0.4224553 0.999 0.990 0 0 IFITM3
EEF1B2 0 0.2574588 1.000 0.996 0 0 EEF1B2
RPL10 0 0.2910058 1.000 1.000 0 0 RPL10
Top 12 marker genes for cluster 1
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
HIST1H4C 0 1.6967600 1.000 0.971 0 1 HIST1H4C
H1F0 0 1.5054951 0.999 0.905 0 1 H1F0
TNFRSF12A 0 1.0434993 0.998 0.975 0 1 TNFRSF12A
UBC 0 1.0281570 1.000 0.988 0 1 UBC
HIST1H1A 0 0.9874826 0.912 0.368 0 1 HIST1H1A
HIST1H1B 0 0.9577795 0.960 0.419 0 1 HIST1H1B
HIST2H2AC 0 0.9471416 0.947 0.519 0 1 HIST2H2AC
TUBA1B 0 0.9326811 1.000 0.985 0 1 TUBA1B
HIST1H1E 0 0.9232609 0.995 0.775 0 1 HIST1H1E
RAB32 0 0.9157641 0.996 0.910 0 1 RAB32
GDF15 0 0.8938917 0.990 0.855 0 1 GDF15
H1FX 0 0.8555449 0.990 0.891 0 1 H1FX
Top 12 marker genes for cluster 2
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
MT1X 0 1.1839377 0.979 0.889 0 2 MT1X
DDIT4 0 1.0521316 0.957 0.807 0 2 DDIT4
RPL23 0 1.0472712 1.000 0.993 0 2 RPL23
RPL7 0 0.7942620 0.999 0.994 0 2 RPL7
RPL7A 0 0.7219394 1.000 0.998 0 2 RPL7A
MARCKS 0 0.7038028 0.993 0.959 0 2 MARCKS
RPS25 0 0.6704229 1.000 0.996 0 2 RPS25
RPL27A 0 0.6521518 0.999 0.998 0 2 RPL27A
RPL34 0 0.6393907 0.999 0.993 0 2 RPL34
RPS14 0 0.5520879 1.000 0.997 0 2 RPS14
RPL101 0 0.5418760 1.000 1.000 0 2 RPL10
EEF1A1 0 0.5091745 1.000 0.999 0 2 EEF1A1
Top 12 marker genes for cluster 3
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
DUT1 0 0.5335540 0.998 0.956 0 3 DUT
COL8A11 0 0.7062640 0.988 0.921 0 3 COL8A1
PRKDC 0 0.4356001 0.996 0.970 0 3 PRKDC
TPM3 0 0.4058355 0.997 0.977 0 3 TPM3
TK11 0 0.4665570 0.998 0.936 0 3 TK1
NCL 0 0.3683779 0.999 0.990 0 3 NCL
DHFR1 0 0.4567680 0.977 0.828 0 3 DHFR
MSH6 0 0.4614454 0.913 0.712 0 3 MSH6
ALCAM 0 0.5731826 0.980 0.909 0 3 ALCAM
MCM31 0 0.4436140 0.955 0.795 0 3 MCM3
MCM4 0 0.4386087 0.932 0.738 0 3 MCM4
OGFRL1 0 0.4382112 0.994 0.959 0 3 OGFRL1
Top 12 marker genes for cluster 4
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
CCNB1 0 2.931391 0.984 0.378 0 4 CCNB1
HMMR 0 2.427292 0.996 0.457 0 4 HMMR
CKS2 0 2.340469 1.000 0.828 0 4 CKS2
UBE2C 0 2.294555 0.973 0.781 0 4 UBE2C
AURKA 0 2.261870 0.961 0.209 0 4 AURKA
PTTG1 0 2.224650 0.999 0.719 0 4 PTTG1
CENPF 0 1.971110 0.999 0.799 0 4 CENPF
UBE2S1 0 1.959481 1.000 0.926 0 4 UBE2S
TOP2A 0 1.945135 0.990 0.792 0 4 TOP2A
CDC20 0 1.939483 0.965 0.368 0 4 CDC20
CENPE 0 1.871429 0.977 0.411 0 4 CENPE
TPX2 0 1.858649 0.999 0.703 0 4 TPX2
Top 12 marker genes for cluster 5
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
HIST1H4C1 0 1.0890944 1.000 0.973 0 5 HIST1H4C
HIST1H1B1 0 0.8197855 0.954 0.466 0 5 HIST1H1B
HIST1H2AL1 0 0.5277395 0.672 0.216 0 5 HIST1H2AL
HIST1H2AH1 0 0.4375728 0.625 0.206 0 5 HIST1H2AH
HIST1H1A1 0 0.7334039 0.871 0.420 0 5 HIST1H1A
HIST1H2AG1 0 0.4842047 0.702 0.267 0 5 HIST1H2AG
HIST1H1C1 0 0.7587779 0.987 0.787 0 5 HIST1H1C
HIST1H3B1 0 0.4791797 0.760 0.312 0 5 HIST1H3B
HIST2H2AC1 0 0.6947366 0.931 0.558 0 5 HIST2H2AC
CLSPN1 0 0.6110546 0.999 0.898 0 5 CLSPN
HIST1H1E1 0 0.6600555 0.991 0.795 0 5 HIST1H1E
TMEM106C1 0 0.5836801 0.995 0.889 0 5 TMEM106C
Top 12 marker genes for cluster 6
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
MT-ND51 0 0.8182990 0.930 0.999 0 6 MT-ND5
COL4A4 0 0.2771294 0.209 0.621 0 6 COL4A4
NEAT11 0 1.0127409 0.814 0.959 0 6 NEAT1
SLC7A11 0 0.3071473 0.209 0.585 0 6 SLC7A11
CD241 0 0.2935826 0.213 0.585 0 6 CD24
ARHGEF28 0 0.3095013 0.269 0.672 0 6 ARHGEF28
LAMB1 0 0.2593780 0.243 0.636 0 6 LAMB1
MALAT1 0 0.8292176 0.977 0.999 0 6 MALAT1
ABI2 0 0.3025271 0.389 0.822 0 6 ABI2
SLC25A24 0 0.2626855 0.306 0.681 0 6 SLC25A24
MACF11 0 0.3125926 0.336 0.725 0 6 MACF1
NSD1 0 0.2632734 0.382 0.799 0 6 NSD1
Top 12 marker genes for cluster 7
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
PVRL4 0 1.5252676 0.794 0.010 0 7 PVRL4
UNC5B-AS1 0 0.6440566 0.502 0.025 0 7 UNC5B-AS1
RRAD 0 0.5410382 0.284 0.007 0 7 RRAD
KIAA1324 0 0.5026825 0.354 0.005 0 7 KIAA1324
DRAXIN 0 0.4787210 0.346 0.001 0 7 DRAXIN
SLC52A1 0 0.3110880 0.317 0.003 0 7 SLC52A1
INPP5D 0 0.2815712 0.317 0.004 0 7 INPP5D
RP3-510D11.2 0 0.6390548 0.658 0.065 0 7 RP3-510D11.2
HES2 0 0.4662985 0.465 0.030 0 7 HES2
ITIH5 0 0.3833306 0.391 0.022 0 7 ITIH5
SULF2 0 0.8225731 0.634 0.068 0 7 SULF2
TCEA3 0 0.6415703 0.626 0.075 0 7 TCEA3