【原】RNA-seq入門實戰(zhàn)（十一）：WGCNA加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析——關(guān)聯(lián)基因模塊與表型

健明 2022-06-11 發(fā)布于廣東

展開全文

連續(xù)兩次求賢令：曾經(jīng)我給你帶來了十萬用戶，但現(xiàn)在祝你倒閉，以及生信技能樹知識整理實習(xí)生招募，讓我走大運結(jié)識了幾位優(yōu)秀小伙伴！大家開始根據(jù)我的ngs組學(xué)視頻進(jìn)行一系列公共數(shù)據(jù)集分析實戰(zhàn)，其中幾個小伙伴讓我非常驚喜，不需要怎么溝通和指導(dǎo)，就默默的完成了一個實戰(zhàn)！

他前面的分享是：

Counts FPKM RPKM TPM CPM 的轉(zhuǎn)化
獲取基因有效長度的N種方

下面是他對我們b站轉(zhuǎn)錄組視頻課程的詳細(xì)筆記

承接上節(jié)：RNA-seq入門實戰(zhàn)（四）：差異分析前的準(zhǔn)備——數(shù)據(jù)檢查，以及 RNA-seq入門實戰(zhàn)（五）：差異分析——DESeq2 edgeR limma的使用與比較

我們《生信技能樹》這些年有很多關(guān)于WGCNA的實戰(zhàn)細(xì)節(jié)建議分享，見：

一文學(xué)會WGCNA分析
一文看懂WGCNA 分析(2019更新版) （點擊閱讀原文即可拿到測序數(shù)據(jù)）
明碼標(biāo)價之公共數(shù)據(jù)集的WGCNA
通過WGCNA作者的測試數(shù)據(jù)來學(xué)習(xí)
重復(fù)一篇WGCNA分析的文章（代碼版）
重復(fù)一篇WGCNA分析的文章（解讀版）（逆向收費讀文獻(xiàn)2019-19）

其實WGCNA本身是對基因進(jìn)行合理（加權(quán)共表達(dá)）的分組。

如果一個表達(dá)量矩陣，里面的樣品是兩個分組，比如正常和對照，那么簡單的差異分析就可以拿到上下調(diào)基因，各自可以去富集生物學(xué)通路，就是基因分組了，并沒有太多的進(jìn)行WGCNA分析的必要性，而且絕大部分的兩個分組的表達(dá)量矩陣?yán)锩娴臉悠窋?shù)量通常是小于15個的，官方也并不推薦WGCNA分析。
如果一個表達(dá)量矩陣，里面的樣品是時間序列這樣的多分組，比如處理前后以及處理過程的不同時間梯度或者不同濃度，那么我們就需要每個分組都去跟對照組進(jìn)行差異分析，上下調(diào)的組合非常多，結(jié)果也很難精煉出生物學(xué)結(jié)論，這個時候就可以選擇WGCNA或者mfuzz這樣的時間序列分析。

也就是說，只要是多分組，就涉及到多次差異分析，而且多分組意味著樣品數(shù)量肯定不少，這樣的話，在這個表達(dá)量矩陣?yán)锩?，不同基因之間可以計算合理的相關(guān)性，就可以根據(jù)基因之間的相似性進(jìn)行基因劃分為不同的模塊了。

本節(jié)概覽

1. WGCNA基本概念：定義、關(guān)鍵術(shù)語、基本流程、一些注意事項

2. WGCNA運行：

?輸入數(shù)據(jù)準(zhǔn)備

①判斷數(shù)據(jù)質(zhì)量，繪制樣品的系統(tǒng)聚類樹

②挑選最佳閾值power

③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)（一步法和分步法），識別基因模塊

④ 關(guān)聯(lián)基因模塊與表型：模塊與表型相關(guān)性熱圖、模塊與表型相關(guān)性boxplot圖、基因與模塊、表型相關(guān)性散點圖

⑤ WGCNA的標(biāo)配熱圖，模塊相關(guān)性展示

⑥ 對感興趣模塊的基因進(jìn)行批量GO分析

⑦ 感興趣模塊繪制熱圖

⑧ 提取感興趣模塊的基因名, 導(dǎo)出基因至 VisANT 或 cytoscape作圖

簡單來說，WGCNA其實相當(dāng)于是對多個復(fù)雜分組進(jìn)行的差異分析，用于找尋不同分組/表型的特征基因模塊，從而進(jìn)行下一步分析(如可以對模塊內(nèi)的基因進(jìn)行GO富集、PPI分析等等).其最最核心之處就在于能將基因模塊與樣本表型進(jìn)行關(guān)聯(lián)。

1. WGCNA基本概念

1.1 定義

WGCNA（Weighted Gene Co-Expression Network Analysis ），即加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析，用于尋找高度相關(guān)的基因構(gòu)成的基因模塊module，利用模塊特征基因eigengene（模塊內(nèi)第一主成分）或模塊內(nèi)的關(guān)鍵基因Hub gene來總結(jié)這些模塊，將模塊與樣本性狀進(jìn)行關(guān)聯(lián)。

1.2 其他關(guān)鍵術(shù)語

關(guān)鍵術(shù)語（Connectivity、Adjacency matrix、TOM等）見：
Simulated-00-Background.pdf ()一文看懂WGCNA 分析WGCNA分析，簡單全面的最新教程

1.3 基本流程

構(gòu)建基因共表達(dá)網(wǎng)絡(luò) >> 識別基因模塊 >> 關(guān)聯(lián)基因模塊與表型 >> 研究基因模塊間關(guān)系 >> 從感興趣的基因模塊中尋找關(guān)鍵驅(qū)動基因

1.4 一些注意事項

WGCNA package: Frequently Asked Questions ()

樣本數(shù) >= 15
基因過濾方法采用均值、方差、中位數(shù)、絕對中位差MAD等方法，過濾低表達(dá)或樣本間變化小的基因，但不建議用差異分析的方法進(jìn)行過濾
輸入數(shù)據(jù)形式如果有批次效應(yīng),需要先進(jìn)行去除；
處理RNAseq數(shù)據(jù)，需要采用DESeq2的varianceStabilizingTransformation方法，或?qū)⒒驑?biāo)準(zhǔn)化后的數(shù)據(jù)（如FPKM、CPM等）進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化
經(jīng)驗軟閾值power當(dāng)無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi)，計算出的power無法達(dá)到要求時（即沒有一個power值可以使無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度降到100以下），可采用經(jīng)驗power值：

2. WGCNA運行

主要參考修改自以下代碼：
GEO/task4-NPC at master · jmzeng1314/GEO · GitHub手把手10分文章WGCNA復(fù)現(xiàn)：小膠質(zhì)細(xì)胞亞群在腦發(fā)育時髓鞘形成的作用

以下實戰(zhàn)數(shù)據(jù)來自于https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE154290，Supplementary file中的GSE154290_RNA.FPKM.txt.gz文件

? 輸入數(shù)據(jù)準(zhǔn)備

讀取fpkm表達(dá)矩陣，進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化；最后對矩陣進(jìn)行轉(zhuǎn)置，變成行為樣品、列為基因的形式

rm(list = ls())  options(stringsAsFactors = F)Sys.setenv(LANGUAGE = "en")library(WGCNA)library(FactoMineR)library(factoextra)  library(tidyverse) # ggplot2 stringer dplyr tidyr readr purrr  tibble forcatslibrary(data.table) #多核讀取文件dir.create('8.WGCNA')setwd('8.WGCNA')### 啟用WGCNA多核計算enableWGCNAThreads(nThreads = 0.75*parallel::detectCores()) 
################################# 0.輸入數(shù)據(jù)準(zhǔn)備 ################################### 讀取表達(dá)矩陣if (T) {  fpkm00 <- fread("../GSE154290_RNA.FPKM.txt.gz",data.table = F)  ### 合并具有相同基因名的行  table(duplicated(fpkm00$gene)) #統(tǒng)計重復(fù)基因名的基因  gene <- fpkm00$gene ; fpkm00 <- fpkm00[,-1]  fpkm0 <- aggregate(fpkm00, by=list(gene), FUN=sum)  fpkm <- column_to_rownames(fpkm0,"Group.1")}data <- log2(fpkm+1)
### 篩選MAD前5000的基因keep_data <- data[order(apply(data,1,mad), decreasing = T)[1:5000],]
### 創(chuàng)建datTraits，包含分組、表型等信息datTraits <- data.frame(row.names = colnames(data),group=colnames(data))fix(datTraits)
### 給分組加上編號grouptype <- data.frame(group=sort(unique(datTraits$group)),                       groupNo=1:length(unique(datTraits$group)))# fix(grouptype)datTraits$groupNo = "NA"for(i in 1:nrow(grouptype)){  datTraits[which(datTraits$group == grouptype$group[i]),'groupNo'] <- grouptype$groupNo[i]}datTraits
### 轉(zhuǎn)置datExpr0 <- as.data.frame(t(keep_data))

在這里根據(jù)細(xì)胞的多能性狀態(tài)，將這些細(xì)胞粗略分為3組，并按其發(fā)育順序編號為123，添加到了其表型數(shù)據(jù)中：

datTraits

① 判斷數(shù)據(jù)質(zhì)量，繪制樣品的系統(tǒng)聚類樹

查看樣本和基因的數(shù)據(jù)質(zhì)量，去除低質(zhì)量數(shù)據(jù)；繪制樣品的系統(tǒng)聚類樹，若存在一個顯著離群點則需要剔除掉；還可以做PCA圖查看樣品分布情況

############################## 1.判斷數(shù)據(jù)質(zhì)量 ################################
### 判斷數(shù)據(jù)質(zhì)量--缺失值gsg <- goodSamplesGenes(datExpr0,verbose = 3)gsg$allOKif (!gsg$allOK){  # Optionally, print the gene and sample names that were removed:  if (sum(!gsg$goodGenes)>0)    printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],                                              collapse = ", ")));  if (sum(!gsg$goodSamples)>0)    printFlush(paste("Removing samples:",                     paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));  # Remove the offending genes and samples from the data:  datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]}gsg <- goodSamplesGenes(datExpr0,verbose = 3)gsg$allOK
### 繪制樣品的系統(tǒng)聚類樹if(T){  #針對樣本做聚類樹  sampleTree <- hclust(dist(datExpr0), method = "average")  par(mar = c(0,5,2,0))  plot(sampleTree, main = "Sample clustering", sub="", xlab="", cex.lab = 2,        cex.axis = 1, cex.main = 1,cex.lab=1)  ## 若樣本有性狀、表型，可以添加對應(yīng)顏色，查看是否聚類合理  sample_colors <- numbers2colors(as.numeric(factor(datTraits$group)),                                   colors = rainbow(length(table(datTraits$group))),                                                    signed = FALSE)  ## 繪制樣品的系統(tǒng)聚類樹及對應(yīng)性狀  par(mar = c(1,4,3,1),cex=0.8)  pdf("step1_Sample dendrogram and trait.pdf",width = 8,height = 6)  plotDendroAndColors(sampleTree, sample_colors,                      groupLabels = "trait",                      cex.dendroLabels = 0.8,                      marAll = c(1, 4, 3, 1),                      cex.rowText = 0.01,                      main = "Sample dendrogram and trait" )  ## Plot a line to show the cut  # abline(h = 23500, col = "red") #根據(jù)實際情況而定  dev.off()}
##若存在顯著離群點；剔除掉if(F){  clust <- cutreeStatic(sampleTree, cutHeight = 23500, minSize = 10) # cutHeight根據(jù)實際情況而定  table(clust)  keepSamples <- (clust==1)  datExpr0 <- datExpr0[keepSamples, ]  datTraits <- datTraits[keepSamples,]  dim(datExpr0) }
### 判斷數(shù)據(jù)質(zhì)量 : PCA進(jìn)行分組查看rm(list = ls())  load("step1_input.Rdata")group_list <- datTraits$groupdat.pca <- PCA(datExpr, graph = F) pca <- fviz_pca_ind(dat.pca,                    title = "Principal Component Analysis",                    legend.title = "Groups",                    geom.ind = c("point","text"), #"point","text"                    pointsize = 2,                    labelsize = 4,                    repel = TRUE, #標(biāo)簽不重疊                    col.ind = group_list, # 分組上色                    axes.linetype=NA,  # remove axeslines                    mean.point=F#去除分組中心點                    ) +  theme(legend.position = "none")+  # "none" REMOVE legend  coord_fixed(ratio = 1) #坐標(biāo)軸的縱橫比pcaggsave(pca,filename= "step1_Sample PCA analysis.pdf", width = 8, height = 8)
##保存數(shù)據(jù)datExpr <- datExpr0nGenes <- ncol(datExpr)nSamples <- nrow(datExpr)save(nGenes,nSamples,datExpr,datTraits,file="step1_input.Rdata")

下圖可看出此次聚類效果還是比較好的，沒有離群值，因此不用進(jìn)行剔除樣本

step1_Sample dendrogram and trait.pdf

step1_Sample PCA analysis.pdf

② 挑選最佳閾值power

挑選標(biāo)準(zhǔn)：R^2 > 0.8 , slope ≈ -1，選擇R^2 與 power關(guān)系圖的拐點處的power值

https://horvath.genetics./html/GeneralFramework/YeastTutorialHorvath.pdf

sft$powerEstimate可以給出所設(shè)定R^2 cut-off（默認(rèn)為0.85）處的power值

############################### 2.挑選最佳閾值power ###################################rm(list = ls())  load("step1_input.Rdata")R.sq_cutoff = 0.8  #設(shè)置R^2 cut-off，默認(rèn)為0.85if(T){  # Call the network topology analysis function  #設(shè)置power參數(shù)選擇范圍  powers <- c(seq(1,20,by = 1), seq(22,30,by = 2))   sft <- pickSoftThreshold(datExpr,                            networkType = "unsigned",                           powerVector = powers,                            RsquaredCut = R.sq_cutoff,                             verbose = 5)  #SFT.R.sq > 0.8 , slope ≈ -1  pdf("step2_power-value.pdf",width = 16,height = 12)  # Plot the results: 尋找拐點，確認(rèn)最終power取值  par(mfrow = c(1,2));  cex1 = 0.9;  # Scale-free topology fit index as a function of the soft-thresholding power  plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],       xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n")  text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],       labels=powers,cex=cex1,col="red")  # this line corresponds to using an R^2 cut-off of h  abline(h=R.sq_cutoff ,col="red")  # Mean connectivity as a function of the soft-thresholding power  plot(sft$fitIndices[,1], sft$fitIndices[,5],       xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n")  text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")  abline(h=100,col="red")  dev.off()}
sft$powerEstimate  #查看估計的最佳power# power = sft$powerEstimatepower = 15
# 若無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi)，沒有一個power值使# 無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度在100以下，可能是由于# 部分樣品與其他樣品差別太大。這可能由批次效應(yīng)、樣品異質(zhì)性或?qū)嶒灄l件對# 表達(dá)影響太大等造成?？梢酝ㄟ^繪制樣品聚類查看分組信息和有無異常樣品。# 如果這確實是由有意義的生物變化引起的，也可以使用下面的經(jīng)驗power值。if(is.na(power)){  # 官方推薦 "signed" 或 "signed hybrid"  # 為與原文檔一致，故未修改  type = "unsigned"  nSamples=nrow(datExpr)  power = ifelse(nSamples<20, ifelse(type == "unsigned", 9, 18),                 ifelse(nSamples<30, ifelse(type == "unsigned", 8, 16),                        ifelse(nSamples<40, ifelse(type == "unsigned", 7, 14),                               ifelse(type == "unsigned", 6, 12))                       )  )}
save(sft, power, file = "step2_power_value.Rdata")

從下圖可看出拐點大致在power為16時出現(xiàn)，且各項數(shù)值基本滿足挑選標(biāo)準(zhǔn)，因此設(shè)定power=16

step2_power-value.pdf

③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)，識別基因模塊

構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)可選擇一步法（one step）或 分步法（step by step）進(jìn)行。
一般情況下優(yōu)先選擇采用簡便的一步法，當(dāng)想要調(diào)整得到的基因模塊數(shù)目時，采用分步法更靈活

一步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)，識別基因模塊
主要調(diào)整參數(shù)為minModuleSize和mergeCutHeight , 數(shù)值越大模塊越少

##################### 3.一步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)，識別基因模塊 ####################load(file = "step1_input.Rdata")load(file = "step2_power_value.Rdata")if(T){  net <- blockwiseModules(    datExpr,    power = power,    maxBlockSize = ncol(datExpr),    corType = "pearson", #默認(rèn)為"pearson","bicor"則更能考慮離群點的影響    networkType = "unsigned",    TOMType = "unsigned",     minModuleSize = 30,    ##越大模塊越少    mergeCutHeight = 0.25, ##越大模塊越少    numericLabels = TRUE,     saveTOMs = F,    verbose = 3  )  table(net$colors)   # power: 上一步計算的軟閾值  # maxBlockSize:計算機能處理的最大模塊的基因數(shù)量(默認(rèn)5000),16G內(nèi)存可以處理2萬個，  # 計算資源允許的情況下最好放在一個block里面。  # corType：計算相關(guān)性的方法；可選pearson(默認(rèn))，bicor。后者更能考慮離群點的影響。  # networkType:計算鄰接矩陣時，是否考慮正負(fù)相關(guān)性；默認(rèn)為"unsigned",可選"signed", "signed hybrid"  # TOMType：計算TOM矩陣時，是否考慮正負(fù)相關(guān)性；默認(rèn)為"signed",可選"unsigned"。但是根據(jù)冪律轉(zhuǎn)換的鄰接矩陣(權(quán)重)的非負(fù)性，所以認(rèn)為這里選擇"signed"也沒有太多的意義。  # numericLabels: 返回數(shù)字而不是顏色作為模塊的名字，后面可以再轉(zhuǎn)換為顏色  # saveTOMs：最耗費時間的計算，可存儲起來供后續(xù)使用，  # mergeCutHeight: 合并模塊的閾值，越大模塊越少,一般為0.25  # minModuleSize: 每個模塊里最少放多少個基因，設(shè)定越大模塊越少  # 輸出結(jié)果根據(jù)模塊中基因數(shù)目的多少，降序排列，依次編號為 `1-最大模塊數(shù)`。  # **0 (grey)**表示**未**分入任何模塊的基因。}
## 模塊可視化，層級聚類樹展示各個模塊if(T){  # Convert labels to colors for plotting  moduleColors <- labels2colors(net$colors)  table(moduleColors)  # Plot the dendrogram and the module colors underneath  pdf("step3_genes-modules_ClusterDendrogram.pdf",width = 16,height = 12)  plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],                      "Module colors",                      dendroLabels = FALSE, hang = 0.03,                      addGuide = TRUE, guideHang = 0.05)  dev.off()}save(net, moduleColors, file = "step3_genes_modules.Rdata")

step3_genes-modules_ClusterDendrogram.pdf

分步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)，識別基因模塊
與一步法類似，主要調(diào)整參數(shù)為minModuleSize和MEDissThres , 數(shù)值越大模塊越少

#####################  分布法完成網(wǎng)絡(luò)構(gòu)建，一般不用 #################################if(F){  ## 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)分為兩步：  ## 1. 計算鄰近值，也是就是兩個基因在不同樣品中表達(dá)量的表達(dá)相關(guān)系數(shù)(pearson correlation rho)，  ## 2. 計算topology overlap similarity (TOM)。用TOM表示兩個基因在網(wǎng)絡(luò)結(jié)構(gòu)上的相似性，即兩個基因如果具有相似的鄰近基因，這兩個基因更傾向于有相互作用。    ###(1)網(wǎng)絡(luò)構(gòu)建 Co-expression similarity and adjacency   adjacency = adjacency(datExpr, power = power)     ###(2) 鄰近矩陣到拓?fù)渚仃嚨霓D(zhuǎn)換，Turn adjacency into topological overlap  TOM = TOMsimilarity(adjacency)  dissTOM = 1-TOM    ###(3) 聚類拓?fù)渚仃?Clustering using TOM  # Call the hierarchical clustering function  geneTree = hclust(as.dist(dissTOM), method = "average");  # Plot the resulting clustering tree (dendrogram)  ## 這個時候的geneTree與一步法的 net$dendrograms[[1]] 性質(zhì)類似，但是還需要進(jìn)行進(jìn)一步處理  plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",       labels = FALSE, hang = 0.04)    ###(4) 聚類分支的修整 dynamicTreeCut   ################# set the minimum module size ##############################  minModuleSize = 30  ####  # Module identification using dynamic tree cut:  dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,                              deepSplit = 2, pamRespectsDendro = FALSE,                              minClusterSize = minModuleSize)  table(dynamicMods)  # Convert numeric lables into colors  dynamicColors = labels2colors(dynamicMods)  table(dynamicColors)  # Plot the dendrogram and colors underneath  plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",                      dendroLabels = FALSE, hang = 0.03,                      addGuide = TRUE, guideHang = 0.05,                      main = "Gene dendrogram and module colors")    ###(5) 聚類結(jié)果相似模塊的融合 Merging of modules whose expression profiles are very similar  # Calculate eigengenes  MEList = moduleEigengenes(datExpr, colors = dynamicColors)  MEs = MEList$eigengenes  # Calculate dissimilarity of module eigengenes  MEDiss = 1-cor(MEs)  # Cluster module eigengenes  METree = hclust(as.dist(MEDiss), method = "average")  #一般選擇 height cut 為0.25,對應(yīng)于有75%相關(guān)性，進(jìn)行融合  ###################### set  Merging height cut  ################################  MEDissThres = 0.5  ####  # Plot the result  plot(METree, main = "Clustering of module eigengenes",       xlab = "", sub = "")  # Plot the cut line into the dendrogram  abline(h=MEDissThres, col = "red")  # Call an automatic merging function  merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)  # The merged module colors  mergedColors = merge$colors  # Eigengenes of the new merged modules:  mergedMEs = merge$newMEs  # 統(tǒng)計mergedmodule  table(mergedColors)    ### (6) plot the gene dendrogram   pdf(file = "step3_stepbystep_DynamicTreeCut_genes-modules.pdf", width = 16,height = 12)  plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),                      c("Dynamic Tree Cut", "Merged dynamic"),                      dendroLabels = FALSE, hang = 0.03,                      addGuide = TRUE, guideHang = 0.05)  dev.off()    ### 保存數(shù)據(jù)  # Rename to moduleColors  moduleColors = mergedColors  # Construct numerical labels corresponding to the colors  colorOrder = c("grey", standardColors(50))  moduleLabels = match(moduleColors, colorOrder)-1  MEs = mergedMEs  # Save module colors and labels for use in subsequent parts  save(MEs, moduleLabels, moduleColors, geneTree,        file = "step3_stepByStep_genes_modules.Rdata")  }

step3_stepbystep_DynamicTreeCut_genes-modules.pdf

④ 關(guān)聯(lián)基因模塊與表型

繪制模塊與表型的相關(guān)性熱圖、模塊與表型的相關(guān)性boxplot圖、基因與模塊、表型的相關(guān)性散點圖；
結(jié)合所得各類相關(guān)性結(jié)果，判斷出哪些模塊與表型是高度相關(guān)的，從而關(guān)聯(lián)基因模塊與表型

####################### 4.關(guān)聯(lián)基因模塊與表型 #####################################rm(list = ls())  load(file = "step1_input.Rdata")load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")
## 模塊與表型的相關(guān)性熱圖if(T){  datTraits$group <- as.factor(datTraits$group)  design <- model.matrix(~0+datTraits$group)  colnames(design) <- levels(datTraits$group) #get the group  MES0 <- moduleEigengenes(datExpr,moduleColors)$eigengenes  #Calculate module eigengenes.  MEs <- orderMEs(MES0)  #Put close eigenvectors next to each other  moduleTraitCor <- cor(MEs,design,use = "p")  moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples)  textMatrix <- paste0(signif(moduleTraitCor,2),"\n(",                      signif(moduleTraitPvalue,1),")")  dim(textMatrix) <- dim(moduleTraitCor)    pdf("step4_Module-trait-relationship_heatmap.pdf",      width = 2*length(colnames(design)),       height = 0.6*length(names(MEs)) )  par(mar=c(5, 9, 3, 3)) #留白：下、左、上、右  labeledHeatmap(Matrix = moduleTraitCor,                 xLabels = colnames(design),                 yLabels = names(MEs),                 ySymbols = names(MEs),                 colorLabels = F,                 colors = blueWhiteRed(50),                 textMatrix = textMatrix,                 setStdMargins = F,                 cex.text = 0.5,                 zlim = c(-1,1),                  main = "Module-trait relationships")  dev.off()  save(design, file = "step4_design.Rdata")}

### 模塊與表型的相關(guān)性boxplot圖 if(T){mes_group <- merge(MEs,datTraits,by="row.names") library(gplots)library(ggpubr)library(grid)library(gridExtra) draw_ggboxplot <- function(data,Module="Module",group="group"){  ggboxplot(data,x=group, y=Module,            ylab = paste0(Module),            xlab = group,            fill = group,            palette = "jco",            #add="jitter",            legend = "") +stat_compare_means()}# 批量畫boxplotcolorNames <- names(MEs)pdf("step4_Module-trait-relationship_boxplot.pdf", width = 7.5,height = 1.6*ncol(MEs))p <- lapply(colorNames,function(x) {  draw_ggboxplot(mes_group, Module = x, group = "group")})do.call(grid.arrange,c(p,ncol=2)) #排布為每行2個dev.off()}

### 基因與模塊、表型的相關(guān)性散點圖#所有的模塊都可以跟基因算出相關(guān)系數(shù)，所有的連續(xù)型性狀也可以跟基因算出相關(guān)系數(shù)， #如果跟性狀顯著相關(guān)的基因也跟某個模塊顯著相關(guān)，那么這些基因可能就非常重要。
# 選擇離散性狀的表型levels(datTraits$group)choose_group <- "primed"  
if(T){  modNames <- substring(names(MEs), 3)    ### 計算模塊與基因的相關(guān)性矩陣   ## Module Membership: 模塊內(nèi)基因表達(dá)與模塊特征值的相關(guān)性  geneModuleMembership <- as.data.frame(cor(datExpr, MEs, use = "p"))  MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))  names(geneModuleMembership) <- paste0("MM", modNames)  names(MMPvalue) <- paste0("p.MM", modNames)    ###  計算性狀與基因的相關(guān)性矩陣   ## Gene significance，GS：比較樣本某個基因與對應(yīng)表型的相關(guān)性  ## 連續(xù)型性狀  # trait <- datTraits$groupNo    ## 非連續(xù)型性狀，需轉(zhuǎn)為0-1矩陣, 已存于design中  trait <- as.data.frame(design[,choose_group])    geneTraitSignificance <- as.data.frame(cor(datExpr,trait,use = "p"))  GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples))  names(geneTraitSignificance) <- paste0("GS")  names(GSPvalue) <- paste0("GS")    ### 可視化基因與模塊、表型的相關(guān)性.  #selectModule<-c("blue","green","purple","grey")  ##可以選擇自己想要的模塊  selectModule <- modNames  ## 全部模塊批量作圖  pdf("step4_gene-Module-trait-significance.pdf",width=7, height=1.5*ncol(MEs))  par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作圖開始  for(module in selectModule){    column <- match(module,selectModule)    print(module)    moduleGenes <- moduleColors==module    verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),                       abs(geneTraitSignificance[moduleGenes, 1]),                       xlab = paste("Module Membership in", module, "module"),                       ylab = "Gene significance for trait",                       main = paste("Module membership vs. gene significance\n"),                       cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)  }  dev.off()}

step4_Module-trait-relationship_heatmap.pdf

step4_Module-trait-relationship_boxplot.pdf

step4_gene-Module-trait-significance.pdf

⑤ WGCNA可視化：TOMplot 、Eigengene-adjacency-heatmap

WGCNA的標(biāo)配熱圖TOMplot / Network heapmap plot，描繪了分析中所有基因之間的拓?fù)渲丿B矩陣（TOM），顏色越深表示基因之間的相關(guān)性越大；
Eigengene-adjacency-heatmap 展示基因模塊之間的相關(guān)性，還可以加入表型信息，查看表型與哪些模塊相關(guān)性高

#########################  5. WGCNA可視化：TOMplot  Eigengene-adjacency-heatmap ##################################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")load(file = "step4_design.Rdata")
if(T){  TOM=TOMsimilarityFromExpr(datExpr,power=power)  dissTOM=1-TOM  ## draw all genes   if(T){    geneTree = net$dendrograms[[1]]    plotTOM = dissTOM^7    diag(plotTOM)=NA    png("step5_TOMplot_Network-heatmap.png",width = 800, height=600)     TOMplot(plotTOM,geneTree,moduleColors,            col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'),            main="Network heapmap plot")    dev.off()  }  ### draw selected genes to save time...just for test...  if(F){    nSelect =0.1*nGenes    set.seed(123)    select=sample(nGenes,size = nSelect)    selectTOM = dissTOM[select,select]    selectTree = hclust(as.dist(selectTOM),method = "average")    selectColors = moduleColors[select]    plotDiss=selectTOM^7    diag(plotDiss)=NA    pdf("step5_select_TOMplot_Network-heatmap.pdf",width=8, height=6)    TOMplot(plotDiss,selectTree,selectColors,            col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'),            main="Network heapmap plot of selected gene")    dev.off()  }}

### 模塊相關(guān)性展示 Eigengene-adjacency-heatmapif(T){  MEs = moduleEigengenes(datExpr,moduleColors)$eigengenes  MET = orderMEs(MEs)  # 若添加表型數(shù)據(jù)  if(T){    ## 連續(xù)型性狀    # MET = orderMEs(cbind(MEs,datTraits$groupNo))    ## 非連續(xù)型性狀，需將是否屬于這個表型進(jìn)行0,1數(shù)值化，已存于design中    design    primed = as.data.frame(design[,3])    names(primed) = "primed"    # Add the weight to existing module eigengenes    MET = orderMEs(cbind(MEs, primed))  }  pdf("step5_module_cor_Eigengene-dendrogram.pdf",width = 8,height = 10)  plotEigengeneNetworks(MET, setLabels="",                         marDendro = c(0,4,1,4),  # 留白：下右上左                        marHeatmap = c(5,5,1,2), # 留白：下右上左                        cex.lab = 0.8,                        xLabelsAngle = 90)  dev.off()}

step5_select_TOMplot_Network-heatmap.pdf

step5_module_cor_Eigengene-dendrogram.pdf

⑥ 對感興趣模塊的基因進(jìn)行GO分析

這里選取了與表型相關(guān)度較高的"black","pink","royalblue"模塊進(jìn)行GO分析

#################### 6. 選擇感興趣基因模塊進(jìn)行GO分析 ####################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")load(file = "step4_design.Rdata")
### 條件設(shè)置OrgDb = "org.Mm.eg.db"  # "org.Mm.eg.db"  "org.Hs.eg.db"genetype = "SYMBOL"    # "SYMBOL"   "ENSEMBL"table(moduleColors)choose_module <- c("black","pink","royalblue")
if(T){  library(clusterProfiler)  library(org.Mm.eg.db)  library(org.Hs.eg.db)    gene_module <- data.frame(gene=colnames(datExpr),                            module=moduleColors)  write.csv(gene_module,file = "step6_gene_moduleColors.csv",row.names = F, quote = F)   tmp <- bitr(gene_module$gene,fromType = genetype,  # "SYMBOL"   "ENSEMBL"              toType = "ENTREZID",              OrgDb = OrgDb )  gene_module_entrz <- merge(tmp,gene_module, by.x=genetype, by.y="gene")    choose_gene_module_entrz <- gene_module_entrz[gene_module_entrz$module %in% choose_module,]    ###run go analysis  formula_res <- compareCluster(    ENTREZID~module,    data = choose_gene_module_entrz,    fun = "enrichGO",    OrgDb = OrgDb,    ont = "BP",  #One of "BP", "MF", and "CC"  or "ALL"    pAdjustMethod = "BH",    pvalueCutoff = 0.25,    qvalueCutoff = 0.25  )    ###精簡GO富集的結(jié)果,去冗余  lineage1_ego <- simplify(     formula_res,    cutoff=0.5,    by="p.adjust",    select_fun=min  )  save(gene_module, formula_res, lineage1_ego, file="step6_module_GO_term.Rdata")  write.csv(lineage1_ego@compareClusterResult,            file="step6_module_GO_term.csv")  ### 繪制dotplot圖  dotp <- dotplot(lineage1_ego,                  showCategory=10,                  includeAll = TRUE, #將有overlap的結(jié)果也展示出來                  label_format=90)  ggsave(dotp,filename= "step6_module_GO_term.pdf", #device = cairo_pdf,         width = 12,          height = 15)}

step6_module_GO_term.pdf

⑦ 感興趣基因模塊繪制熱圖

這里選取了與primed表型相關(guān)度很高的royalblue模塊繪制熱圖

############################### 7.感興趣基因模塊繪制熱圖 ######################################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step3_genes_modules.Rdata")table(moduleColors)
module = "royalblue"### 感興趣模塊畫熱圖 if(T){  dat=datExpr[,moduleColors==module]  library(pheatmap)  n=t(scale(dat)) #對基因做scale，并轉(zhuǎn)置表達(dá)矩陣為行為基因、列為樣本形式  # n[n>2]=2   # n[n< -2]= -2  # n[1:4,1:4]    group_list=datTraits$group  ac=data.frame(g=group_list)  rownames(ac)=colnames(n)    pheatmap::pheatmap(n,                     fontsize = 8,                     show_colnames =T,                     show_rownames = F,                     cluster_cols = T,                     annotation_col =ac,                     width = 8,                     height = 6,                     angle_col=45,                     main = paste0("module_",module,"-gene heatmap"),                     filename = paste0("step7_module_",module,"_Gene-heatmap.pdf"))           }

step7_module_royalblue_Gene-heatmap.pdf

⑧ 導(dǎo)出基因至 VisANT 或 cytoscape

將感興趣模塊 "royalblue"的基因?qū)С?VisANT或cytoscape，cytoscape作圖請參閱RNA-seq入門實戰(zhàn)（十）：PPI蛋白互作網(wǎng)絡(luò)構(gòu)建（下）——Cytoscape軟件的使用

################### 8.感興趣模塊基因?qū)С?VisANT or cytoscape ######################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")module = "royalblue"if(T){  ### 提取感興趣模塊基因名  gene <- colnames(datExpr)   inModule <- moduleColors==module  modgene <- gene[inModule]    ### 模塊對應(yīng)的基因關(guān)系矩陣  TOM <- TOMsimilarityFromExpr(datExpr,power=power)  modTOM <- TOM[inModule,inModule]  dimnames(modTOM) <- list(modgene,modgene)
   ### 篩選連接度最大的top100基因  nTop = 100  IMConn = softConnectivity(datExpr[, modgene]) #計算連接度  top = (rank(-IMConn) <= nTop) #選取連接度最大的top100  filter_modTOM <- modTOM[top, top]    # for visANT  vis <- exportNetworkToVisANT(filter_modTOM,                               file = paste("step8_visANTinput-",module,".txt",sep = ""),                               weighted = T,threshold = 0)  # for cytoscape  cyt <- exportNetworkToCytoscape(filter_modTOM,                                 edgeFile = paste("step8_CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""),                                 nodeFile = paste("step8_CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""),                                 weighted = TRUE,                                 threshold = 0.15,  #weighted權(quán)重篩選閾值，可調(diào)整                                 nodeNames = modgene[top],                                  nodeAttr = moduleColors[inModule][top])}