【原】RNAseq|WGCNA-組學(xué)數(shù)據(jù)黏合劑，代碼實戰(zhàn)-一（盡）文（力）解決文獻(xiàn)中常見的可視化圖

生信補給站 2023-03-07 發(fā)布于北京

展開全文

WGCNA （weighted gene co-expression network analysis）權(quán)重基因共表達(dá)網(wǎng)絡(luò)分析（流程模塊見下圖），可將表達(dá)模式相似的基因進(jìn)行聚類，并分析模塊與特定性狀或表型之間的關(guān)聯(lián)，常用于篩選關(guān)鍵表型的hub基因，是RNAseq分析中的一塊很重要的拼圖。而之所以叫組學(xué)數(shù)據(jù)黏合劑是因為表型可以是患者的臨床信息（生存信息，分期信息，基線信息等），可以是重測序信息腫瘤（驅(qū)動基因的變異與否，signature ，CNV信息等），可以是轉(zhuǎn)錄組結(jié)果（免疫浸潤，risk score ，GSVA ，分子分型結(jié)果），可以是單細(xì)胞數(shù)據(jù)（celltype ，AUCell 打分）等等。注：這些在公眾號之前的文章中大多都有涉及，文末有部分鏈接。

本文將著重介紹WGCNA中常見圖形的代碼實現(xiàn) 以及圖形的意義，如果有其他常見的WGCNA圖形而本文未介紹，歡迎留言交流。

一載入R包，數(shù)據(jù)

仍然使用之前處理過的TCGA的SKCM數(shù)據(jù)，此外將之前得到的cibersort免疫浸潤的結(jié)果作為臨床表型數(shù)據(jù)進(jìn)行關(guān)聯(lián) ，文章末尾有測試數(shù)據(jù)獲取方式。

library(tidyverse)library(openxlsx)library(data.table)#BiocManager::install("impute")library(WGCNA)
load("RNAseq.SKCM_WGCNA.RData")head(cibersort_in,2)expr2[1:4,1:4]

二 WGCNA分析

2.1 獲取軟閾值

WGCNA針對的是基因進(jìn)行聚類，注意下自己的數(shù)據(jù)是否需要轉(zhuǎn)置。

## 選取top 5000 mad genesWGCNA_matrix = t(expr2[order(apply(expr2,1,mad), decreasing = T)[1:5000],]) datExpr <- WGCNA_matrix  
### beta value#########powers = c(c(1:10), seq(from = 12, to=30, by=2))sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
par(mfrow = c(1,2))cex1 = 0.9; #可自行調(diào)整# Scale-free topology fit index as a function of the soft-thresholding powerplot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",     main = paste("Scale independence"));text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],     labels=powers,cex=cex1,col="red");# this line corresponds to using an R^2 cut-off of habline(h=0.90,col="red")# Mean connectivity as a function of the soft-thresholding powerplot(sft$fitIndices[,1], sft$fitIndices[,5],     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",     main = paste("Mean connectivity"))text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
sft$powerEstimate

左圖為在不同軟閾值（x軸）情況下的無標(biāo)度擬合指數(shù)（scale-free fit index，y軸）。本示例中閾值選擇的0.9，也可以根據(jù)數(shù)據(jù)的真實情況進(jìn)行微調(diào)。可見當(dāng)閾值選擇0.9時最小軟閾值（sft$powerEstimate）為3，因此選擇3作為最優(yōu)軟閾值用于后續(xù)分析。

2.2 構(gòu)建共表達(dá)矩陣

得到表達(dá)矩陣以及最優(yōu)beta值，就可以通過一步法構(gòu)建共表達(dá)矩陣，然后使用plotDendroAndColors函數(shù)進(jìn)行可視化

net = blockwiseModules(  datExpr,  power = sft$powerEstimate,  #power = sft$powerEstimate,  maxBlockSize = 6000,  TOMType = "unsigned", minModuleSize = 30,  reassignThreshold = 0, mergeCutHeight = 0.25,  numericLabels = TRUE, pamRespectsDendro = FALSE,  saveTOMs = TRUE,  saveTOMFileBase = "AS-green-FPKM-TOM",  verbose = 3)table(net$colors) ####模塊可視化####mergedColors = labels2colors(net$colors)table(mergedColors)# 可視化plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],                    "Module colors",                    dendroLabels = FALSE, hang = 0.03,                    addGuide = TRUE, guideHang = 0.05)

不同的顏色代表不同的模塊，相同顏色的模塊中為

表達(dá)模式相似的基因。grey模塊中包含了所有未參與聚類的基因，一般不用于后續(xù)分析。

可以通過table(net$colors) 或者 mergedColors = labels2colors(net$colors) 來查看每種顏色中有多少基因。

2.3 繪制網(wǎng)絡(luò)熱圖

該步驟非常消耗計算資源和時間，此處示例選取其中部分基因進(jìn)行展示

#隨機(jī)選取部分基因作圖nGenes = ncol(datExpr)nSamples = nrow(datExpr)nSelect = 500# 設(shè)置隨機(jī)種子，方便復(fù)現(xiàn)結(jié)果set.seed(1234);select = sample(nGenes, size = nSelect);selectTOM = dissTOM[select, select];# There’s no simple way of restricting a clustering tree to a subset of genes, so we must re-cluster.selectTree = hclust(as.dist(selectTOM), method = "average")selectColors = moduleColors[select];# Open a graphical windowsizeGrWindow(9,9)
plotDiss = selectTOM^7;diag(plotDiss) = NA;
TOMplot(plotDiss, selectTree, selectColors,terrainColors = T,         main = "Network heatmap plot, selected genes")

使用col函數(shù)修改顏色

library(gplots)TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes",         col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'))

基因之間的相關(guān)性熱圖，其中顏色越深，說明基因之間的相互作用越強。

2.4 模塊和性狀的關(guān)系

本示例使用cibersort的結(jié)果，當(dāng)然可以替換為任何你關(guān)注的，患者的臨床信息（生存信息，分期信息，基線信息等），可以是重測序信息腫瘤（驅(qū)動基因的變異與否，signature ，CNV信息等），可以是轉(zhuǎn)錄組結(jié)果（免疫浸潤，risk score ，GSVA ，分子分型結(jié)果），可以是單細(xì)胞數(shù)據(jù)（celltype ，AUCell 打分）等等。

datTraits <- cibersort_in %>%   select(sample:Neutrophils ) %>%  column_to_rownames("sample")nSamples = nrow(datExpr)sampleNames = rownames(datExpr)traitRows = match(datTraits$sample, rownames(datTraits) )  
moduleColors <- labels2colors(net$colors)# Recalculate MEs with color labelsMEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenesMEs = orderMEs(MEs0);##計算相關(guān)系數(shù)和P值moduleTraitCor = cor(MEs, datTraits , use = "p");moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
sizeGrWindow(10,6)# 圖中展示P值textMatrix = paste(signif(moduleTraitCor, 2), "\n(",                   signif(moduleTraitPvalue, 1), ")", sep = "");dim(textMatrix) = dim(moduleTraitCor)#通過par設(shè)置圖形邊界par(mar = c(10, 10, 3, 3));# 熱圖的形式展示結(jié)果labeledHeatmap(Matrix = moduleTraitCor,               xLabels = colnames(datTraits),               yLabels = names(MEs),               ySymbols = names(MEs),               colorLabels = FALSE,               colors = blueWhiteRed(50),               textMatrix = textMatrix,               setStdMargins = FALSE,               cex.text = 0.5,               zlim = c(-1,1),               main = paste("Module-trait relationships"))

一般會根據(jù)

相關(guān)性的絕對值篩選最相關(guān)模塊

，負(fù)相關(guān)模塊也要考慮。

假如重點關(guān)注的表型是CD8 Tcell 或者 Treg免疫浸潤程度，可以看到這2個表型和MEblue模塊的相關(guān)系數(shù)（顏色）最高且P值（括號內(nèi)數(shù)值）很顯著。

然后就可以提取MEblue模塊中的所有基因或者 hub基因進(jìn)行后續(xù)分析

2.5 顯著模塊柱形圖

還可以通過柱形圖的形式查看與目標(biāo)性狀最顯著的模塊

########## 顯著模塊柱形圖#######moduleColors = labels2colors(net$colors)y <- datTraits[, "T cells CD8"]GS <- as.numeric(cor(y ,datExpr, use="p"))GeneSignificance <-  abs(GS)ModuleSignificance <- tapply(  GeneSignificance,  moduleColors, mean, na.rm=T)plotModuleSignificance(GeneSignificance, moduleColors,ylim = c(0,0.4))

也可以很明顯的看到blue模塊最顯著。

2.6 模塊之間聚類以及熱圖

通過聚類樹或者熱圖的方式查看模塊之間的相關(guān)程度，可以把感興趣的

“表型“

添加進(jìn)去

# Recalculate module eigengenesMEs = moduleEigengenes(datExpr, moduleColors)$eigengenesCD8_T = as.data.frame(datTraits[,4]);names(CD8_T) = "CD8_T"modNames = substring(names(MEs), 3)# Add the weight to existing module eigengenesMET = orderMEs(cbind(MEs, CD8_T))# Plot the relationships among the eigengenes and the traitsizeGrWindow(5,7.5);par(cex = 0.9)plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle                      = 90)

仍然可以看到CD8_T 和 blue模塊最相關(guān)。

注：可以通過plotHeatmaps = FALSE 或者 plotDendrograms = FALSE 參數(shù)，只繪制上半部分的樹形圖或者下面的熱圖。

2.7 MM vs GS 相關(guān)性

查看基因與關(guān)鍵模塊（blue）的相關(guān)性（Module Membership, MM）和基因與性狀（CD8_T）的相關(guān)性（Gene Significance, GS）

pheno = "CD8_T"# 獲取關(guān)注的列module_column = match(module, modNames)pheno_column = match(pheno,colnames(datTraits))# 獲取模塊內(nèi)的基因moduleGenes = moduleColors == module
sizeGrWindow(7, 7);par(mfrow = c(1,1));verboseScatterplot(abs(geneModuleMembership[moduleGenes,                                             module_column]),                   abs(geneTraitSignificance[moduleGenes, 1]),                    xlab = paste("Module Membership in", module, "module"),                    ylab = paste("Gene significance for", pheno),                    main = paste("Module membership  vs gene significance\n"),                    cex.main = 1.2, cex.lab = 1.2, cex.axis =                      1.2, col = module)

通過散點圖可以發(fā)現(xiàn)MM和GS呈正相關(guān)，說明與CD8_T高度相關(guān)的基因，在blue模塊中也很重要。

2.8 提取模塊基因

前面一些可視化結(jié)果很重要，但是WGCNA分析更需要的是獲取目標(biāo)模塊的基因，可以是blue模塊的所有基因，也可以提取blue模塊的hub基因。

1) 提取blue的所有基因

module = "blue";# Select module probesgeneID= colnames(datExpr) ## 我們例子里面的probe就是基因名inModule = (moduleColors==module);module_gene = geneID[inModule];write.csv(module_gene,"module_gene.csv")

2）提取hub基因

一般使用較多的方式是

根據(jù) |MM| > 0.8 且 |GS| > 0.2

的閾值進(jìn)行篩選hub基因，也可以根據(jù)數(shù)據(jù)表現(xiàn)自行更改閾值

hub<- abs(geneModuleMembership$MMblue)>0.8 & abs(geneTraitSignificance)>0.2hub<-as.data.frame(dimnames(data.frame(datExpr))[[2]][hub])head(hub,2)#  dimnames(data.frame(datExpr))[[2]][hub]#1                                     IGJ#2                                   CXCL9

得到這些基因，就可以進(jìn)行后續(xù)分析和挖掘了。常見的是單因素生存分析，lasso回歸篩選，構(gòu)建risk score等分析。

2.9 保存結(jié)果以備后用

save(module_gene,hub ,MEs,geneModuleMembership,geneTraitSignificance,     file = "SKCM.WGCNA_MM_GS.RData")

更多原理和參數(shù)設(shè)置相關(guān)的請查閱WGCNA的文獻(xiàn)以及官網(wǎng)。

參考資料：https://horvath.genetics./html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：生信補給站 > 《待分類》

舉報/認(rèn)領(lǐng)

0條評論

發(fā)表

請遵守用戶評論公約

類似文章 更多

生信補給站

關(guān)注對話

TA的最新館藏

RNAseq｜Mime代碼版-終極101 種機(jī)器學(xué)習(xí)算法組合構(gòu)建最優(yōu)預(yù)后模型
scRNA | scTCR中 T細(xì)胞動態(tài)變化（Startrac）vs scRNA指數(shù)評分
scRNA | 和頂刊學(xué)分析，OR值展示不同分組的細(xì)胞類型差異
scRNA|R版CytoTRACE v2從0開始完成單細(xì)胞分化潛能預(yù)測
scRNA|使用scMetabolism完成單細(xì)胞代謝激活分?jǐn)?shù)估計
RNAseq | ComplexHeatmap繪制臨床數(shù)據(jù)熱圖（所見即所得）

喜歡該文的人也喜歡更多

熱門閱讀換一換

午夜视频在线网站,日韩视频精品在线,中文字幕精品一区二区三区在线,在线播放精品,1024你懂我懂的旧版人,欧美日韩一级黄色片,一区二区三区在线观看视频

【原】RNAseq|WGCNA-組學(xué)數(shù)據(jù)黏合劑，代碼實戰(zhàn)-一（盡）文（力）解決文獻(xiàn)中常見的可視化圖

【原】RNAseq|WGCNA-組學(xué)數(shù)據(jù)黏合劑，代碼實戰(zhàn)-一（盡）文（力）解決文獻(xiàn)中常見的可視化圖