下面是他對我們b站轉(zhuǎn)錄組視頻課程的詳細(xì)筆記 承接上節(jié):RNA-seq入門實戰(zhàn)(四):差異分析前的準(zhǔn)備——數(shù)據(jù)檢查,以及 RNA-seq入門實戰(zhàn)(五):差異分析——DESeq2 edgeR limma的使用與比較 我們《生信技能樹》這些年有很多關(guān)于WGCNA的實戰(zhàn)細(xì)節(jié)建議分享,見: 其實WGCNA本身是對基因進(jìn)行合理(加權(quán)共表達(dá))的分組。 - 如果一個表達(dá)量矩陣, 里面的樣品是兩個分組,比如正常和對照,那么簡單的差異分析就可以拿到上下調(diào)基因,各自可以去富集生物學(xué)通路,就是基因分組了,并沒有太多的進(jìn)行WGCNA分析的必要性,而且絕大部分的兩個分組的表達(dá)量矩陣?yán)锩娴臉悠窋?shù)量通常是小于15個的,官方也并不推薦WGCNA分析。
- 如果一個表達(dá)量矩陣,里面的樣品是時間序列這樣的多分組,比如處理前后以及處理過程的不同時間梯度或者不同濃度,那么我們就需要每個分組都去跟對照組進(jìn)行差異分析,上下調(diào)的組合非常多,結(jié)果也很難精煉出生物學(xué)結(jié)論,這個時候就可以選擇WGCNA或者mfuzz這樣的時間序列分析。
也就是說,只要是多分組,就涉及到多次差異分析,而且多分組意味著樣品數(shù)量肯定不少,這樣的話,在這個表達(dá)量矩陣?yán)锩?,不同基因之間可以計算合理的相關(guān)性, 就可以根據(jù)基因之間的相似性進(jìn)行基因劃分為不同的模塊了。 本節(jié)概覽 1. WGCNA基本概念:定義、關(guān)鍵術(shù)語、基本流程、一些注意事項 2. WGCNA運行: ?輸入數(shù)據(jù)準(zhǔn)備 ①判斷數(shù)據(jù)質(zhì)量,繪制樣品的系統(tǒng)聚類樹 ②挑選最佳閾值power ③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)( 一步法和分步法),識別基因模塊 ④ 關(guān)聯(lián)基因模塊與表型:模塊與表型相關(guān)性熱圖、模塊與表型相關(guān)性boxplot圖、基因與模塊、表型相關(guān)性散點圖 ⑤ WGCNA的標(biāo)配熱圖 ,模塊相關(guān)性展示 ⑥ 對感興趣模塊的基因進(jìn)行批量GO分析 ⑦ 感興趣模塊繪制熱圖 ⑧ 提取感興趣模塊的基因名, 導(dǎo)出基因至 VisANT 或 cytoscape作圖 簡單來說,WGCNA其實相當(dāng)于是對多個復(fù)雜分組進(jìn)行的差異分析,用于找尋不同分組/表型的特征基因模塊,從而進(jìn)行下一步分析(如可以對模塊內(nèi)的基因進(jìn)行GO富集、PPI分析等等).其最最核心之處就在于能將基因模塊與樣本表型進(jìn)行關(guān)聯(lián)。 1. WGCNA基本概念1.1 定義WGCNA(Weighted Gene Co-Expression Network Analysis ),即加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析,用于尋找高度相關(guān)的基因構(gòu)成的基因模塊module,利用模塊特征基因eigengene(模塊內(nèi)第一主成分)或模塊內(nèi)的關(guān)鍵基因Hub gene來總結(jié)這些模塊,將模塊與樣本性狀進(jìn)行關(guān)聯(lián)。
1.2 其他關(guān)鍵術(shù)語關(guān)鍵術(shù)語(Connectivity、Adjacency matrix、TOM等)見: Simulated-00-Background.pdf ()一文看懂WGCNA 分析WGCNA分析,簡單全面的最新教程 1.3 基本流程構(gòu)建基因共表達(dá)網(wǎng)絡(luò) >> 識別基因模塊 >> 關(guān)聯(lián)基因模塊與表型 >> 研究基因模塊間關(guān)系 >> 從感興趣的基因模塊中尋找關(guān)鍵驅(qū)動基因 1.4 一些注意事項WGCNA package: Frequently Asked Questions () 樣本數(shù) >= 15 基因過濾方法采用均值、方差、中位數(shù)、絕對中位差MAD等方法,過濾低表達(dá)或樣本間變化小的基因,但不建議用差異分析的方法進(jìn)行過濾 輸入數(shù)據(jù)形式如果有批次效應(yīng),需要先進(jìn)行去除; 處理RNAseq數(shù)據(jù),需要采用DESeq2的varianceStabilizingTransformation方法,或?qū)⒒驑?biāo)準(zhǔn)化后的數(shù)據(jù)(如FPKM、CPM等)進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化 經(jīng)驗軟閾值power當(dāng)無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),計算出的power無法達(dá)到要求時(即沒有一個power值可以使無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度降到100以下),可采用經(jīng)驗power值:
2. WGCNA運行主要參考修改自以下代碼: GEO/task4-NPC at master · jmzeng1314/GEO · GitHub手把手10分文章WGCNA復(fù)現(xiàn):小膠質(zhì)細(xì)胞亞群在腦發(fā)育時髓鞘形成的作用 以下實戰(zhàn)數(shù)據(jù)來自于https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE154290,Supplementary file中的GSE154290_RNA.FPKM.txt.gz文件
? 輸入數(shù)據(jù)準(zhǔn)備讀取fpkm表達(dá)矩陣,進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化;最后對矩陣進(jìn)行轉(zhuǎn)置,變成行為樣品、列為基因的形式 rm(list = ls()) options(stringsAsFactors = F) Sys.setenv(LANGUAGE = "en") library(WGCNA) library(FactoMineR) library(factoextra) library(tidyverse) # ggplot2 stringer dplyr tidyr readr purrr tibble forcats library(data.table) #多核讀取文件 dir.create('8.WGCNA') setwd('8.WGCNA') ### 啟用WGCNA多核計算 enableWGCNAThreads(nThreads = 0.75*parallel::detectCores())
################################# 0.輸入數(shù)據(jù)準(zhǔn)備 ################################ ### 讀取表達(dá)矩陣 if (T) { fpkm00 <- fread("../GSE154290_RNA.FPKM.txt.gz",data.table = F) ### 合并具有相同基因名的行 table(duplicated(fpkm00$gene)) #統(tǒng)計重復(fù)基因名的基因 gene <- fpkm00$gene ; fpkm00 <- fpkm00[,-1] fpkm0 <- aggregate(fpkm00, by=list(gene), FUN=sum) fpkm <- column_to_rownames(fpkm0,"Group.1") } data <- log2(fpkm+1)
### 篩選MAD前5000的基因 keep_data <- data[order(apply(data,1,mad), decreasing = T)[1:5000],]
### 創(chuàng)建datTraits,包含分組、表型等信息 datTraits <- data.frame(row.names = colnames(data),group=colnames(data)) fix(datTraits)
### 給分組加上編號 grouptype <- data.frame(group=sort(unique(datTraits$group)), groupNo=1:length(unique(datTraits$group))) # fix(grouptype) datTraits$groupNo = "NA" for(i in 1:nrow(grouptype)){ datTraits[which(datTraits$group == grouptype$group[i]),'groupNo'] <- grouptype$groupNo[i]} datTraits
### 轉(zhuǎn)置 datExpr0 <- as.data.frame(t(keep_data))
在這里根據(jù)細(xì)胞的多能性狀態(tài),將這些細(xì)胞粗略分為3組,并按其發(fā)育順序編號為123,添加到了其表型數(shù)據(jù)中: datTraits
① 判斷數(shù)據(jù)質(zhì)量,繪制樣品的系統(tǒng)聚類樹查看樣本和基因的數(shù)據(jù)質(zhì)量,去除低質(zhì)量數(shù)據(jù);繪制樣品的系統(tǒng)聚類樹,若存在一個顯著離群點則需要剔除掉;還可以做PCA圖查看樣品分布情況 ############################## 1.判斷數(shù)據(jù)質(zhì)量 ################################
### 判斷數(shù)據(jù)質(zhì)量--缺失值 gsg <- goodSamplesGenes(datExpr0,verbose = 3) gsg$allOK if (!gsg$allOK){ # Optionally, print the gene and sample names that were removed: if (sum(!gsg$goodGenes)>0) printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", "))); if (sum(!gsg$goodSamples)>0) printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", "))); # Remove the offending genes and samples from the data: datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes] } gsg <- goodSamplesGenes(datExpr0,verbose = 3) gsg$allOK
### 繪制樣品的系統(tǒng)聚類樹 if(T){ #針對樣本做聚類樹 sampleTree <- hclust(dist(datExpr0), method = "average") par(mar = c(0,5,2,0)) plot(sampleTree, main = "Sample clustering", sub="", xlab="", cex.lab = 2, cex.axis = 1, cex.main = 1,cex.lab=1) ## 若樣本有性狀、表型,可以添加對應(yīng)顏色,查看是否聚類合理 sample_colors <- numbers2colors(as.numeric(factor(datTraits$group)), colors = rainbow(length(table(datTraits$group))), signed = FALSE) ## 繪制樣品的系統(tǒng)聚類樹及對應(yīng)性狀 par(mar = c(1,4,3,1),cex=0.8) pdf("step1_Sample dendrogram and trait.pdf",width = 8,height = 6) plotDendroAndColors(sampleTree, sample_colors, groupLabels = "trait", cex.dendroLabels = 0.8, marAll = c(1, 4, 3, 1), cex.rowText = 0.01, main = "Sample dendrogram and trait" ) ## Plot a line to show the cut # abline(h = 23500, col = "red") #根據(jù)實際情況而定 dev.off() }
##若存在顯著離群點;剔除掉 if(F){ clust <- cutreeStatic(sampleTree, cutHeight = 23500, minSize = 10) # cutHeight根據(jù)實際情況而定 table(clust) keepSamples <- (clust==1) datExpr0 <- datExpr0[keepSamples, ] datTraits <- datTraits[keepSamples,] dim(datExpr0) }
### 判斷數(shù)據(jù)質(zhì)量 : PCA進(jìn)行分組查看 rm(list = ls()) load("step1_input.Rdata") group_list <- datTraits$group dat.pca <- PCA(datExpr, graph = F) pca <- fviz_pca_ind(dat.pca, title = "Principal Component Analysis", legend.title = "Groups", geom.ind = c("point","text"), #"point","text" pointsize = 2, labelsize = 4, repel = TRUE, #標(biāo)簽不重疊 col.ind = group_list, # 分組上色 axes.linetype=NA, # remove axeslines mean.point=F#去除分組中心點 ) + theme(legend.position = "none")+ # "none" REMOVE legend coord_fixed(ratio = 1) #坐標(biāo)軸的縱橫比 pca ggsave(pca,filename= "step1_Sample PCA analysis.pdf", width = 8, height = 8)
##保存數(shù)據(jù) datExpr <- datExpr0 nGenes <- ncol(datExpr) nSamples <- nrow(datExpr) save(nGenes,nSamples,datExpr,datTraits,file="step1_input.Rdata")
下圖可看出此次聚類效果還是比較好的,沒有離群值,因此不用進(jìn)行剔除樣本 step1_Sample dendrogram and trait.pdf step1_Sample PCA analysis.pdf
② 挑選最佳閾值power挑選標(biāo)準(zhǔn):R^2 > 0.8 , slope ≈ -1,選擇R^2 與 power關(guān)系圖的拐點處的power值 https://horvath.genetics./html/GeneralFramework/YeastTutorialHorvath.pdf sft$powerEstimate可以給出所設(shè)定R^2 cut-off(默認(rèn)為0.85)處的power值 ############################### 2.挑選最佳閾值power ################################### rm(list = ls()) load("step1_input.Rdata") R.sq_cutoff = 0.8 #設(shè)置R^2 cut-off,默認(rèn)為0.85 if(T){ # Call the network topology analysis function #設(shè)置power參數(shù)選擇范圍 powers <- c(seq(1,20,by = 1), seq(22,30,by = 2)) sft <- pickSoftThreshold(datExpr, networkType = "unsigned", powerVector = powers, RsquaredCut = R.sq_cutoff, verbose = 5) #SFT.R.sq > 0.8 , slope ≈ -1 pdf("step2_power-value.pdf",width = 16,height = 12) # Plot the results: 尋找拐點,確認(rèn)最終power取值 par(mfrow = c(1,2)); cex1 = 0.9; # Scale-free topology fit index as a function of the soft-thresholding power plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n") text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], labels=powers,cex=cex1,col="red") # this line corresponds to using an R^2 cut-off of h abline(h=R.sq_cutoff ,col="red") # Mean connectivity as a function of the soft-thresholding power plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n") text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red") abline(h=100,col="red") dev.off() }
sft$powerEstimate #查看估計的最佳power # power = sft$powerEstimate power = 15
# 若無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),沒有一個power值使 # 無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度在100以下,可能是由于 # 部分樣品與其他樣品差別太大。這可能由批次效應(yīng)、樣品異質(zhì)性或?qū)嶒灄l件對 # 表達(dá)影響太大等造成??梢酝ㄟ^繪制樣品聚類查看分組信息和有無異常樣品。 # 如果這確實是由有意義的生物變化引起的,也可以使用下面的經(jīng)驗power值。 if(is.na(power)){ # 官方推薦 "signed" 或 "signed hybrid" # 為與原文檔一致,故未修改 type = "unsigned" nSamples=nrow(datExpr) power = ifelse(nSamples<20, ifelse(type == "unsigned", 9, 18), ifelse(nSamples<30, ifelse(type == "unsigned", 8, 16), ifelse(nSamples<40, ifelse(type == "unsigned", 7, 14), ifelse(type == "unsigned", 6, 12)) ) ) }
save(sft, power, file = "step2_power_value.Rdata")
從下圖可看出拐點大致在power為16時出現(xiàn),且各項數(shù)值基本滿足挑選標(biāo)準(zhǔn),因此設(shè)定power=16 step2_power-value.pdf
③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)可選擇一步法(one step)或 分步法(step by step)進(jìn)行。 一般情況下優(yōu)先選擇采用簡便的一步法,當(dāng)想要調(diào)整得到的基因模塊數(shù)目時,采用分步法更靈活 ##################### 3.一步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊 #################### load(file = "step1_input.Rdata") load(file = "step2_power_value.Rdata") if(T){ net <- blockwiseModules( datExpr, power = power, maxBlockSize = ncol(datExpr), corType = "pearson", #默認(rèn)為"pearson","bicor"則更能考慮離群點的影響 networkType = "unsigned", TOMType = "unsigned", minModuleSize = 30, ##越大模塊越少 mergeCutHeight = 0.25, ##越大模塊越少 numericLabels = TRUE, saveTOMs = F, verbose = 3 ) table(net$colors) # power: 上一步計算的軟閾值 # maxBlockSize:計算機能處理的最大模塊的基因數(shù)量(默認(rèn)5000),16G內(nèi)存可以處理2萬個, # 計算資源允許的情況下最好放在一個block里面。 # corType:計算相關(guān)性的方法;可選pearson(默認(rèn)),bicor。后者更能考慮離群點的影響。 # networkType:計算鄰接矩陣時,是否考慮正負(fù)相關(guān)性;默認(rèn)為"unsigned",可選"signed", "signed hybrid" # TOMType:計算TOM矩陣時,是否考慮正負(fù)相關(guān)性;默認(rèn)為"signed",可選"unsigned"。但是根據(jù)冪律轉(zhuǎn)換的鄰接矩陣(權(quán)重)的非負(fù)性,所以認(rèn)為這里選擇"signed"也沒有太多的意義。 # numericLabels: 返回數(shù)字而不是顏色作為模塊的名字,后面可以再轉(zhuǎn)換為顏色 # saveTOMs:最耗費時間的計算,可存儲起來供后續(xù)使用, # mergeCutHeight: 合并模塊的閾值,越大模塊越少,一般為0.25 # minModuleSize: 每個模塊里最少放多少個基因,設(shè)定越大模塊越少 # 輸出結(jié)果根據(jù)模塊中基因數(shù)目的多少,降序排列,依次編號為 `1-最大模塊數(shù)`。 # **0 (grey)**表示**未**分入任何模塊的基因。 }
## 模塊可視化,層級聚類樹展示各個模塊 if(T){ # Convert labels to colors for plotting moduleColors <- labels2colors(net$colors) table(moduleColors) # Plot the dendrogram and the module colors underneath pdf("step3_genes-modules_ClusterDendrogram.pdf",width = 16,height = 12) plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]], "Module colors", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05) dev.off() } save(net, moduleColors, file = "step3_genes_modules.Rdata")
step3_genes-modules_ClusterDendrogram.pdf ##################### 分布法完成網(wǎng)絡(luò)構(gòu)建,一般不用 ################################# if(F){ ## 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)分為兩步: ## 1. 計算鄰近值,也是就是兩個基因在不同樣品中表達(dá)量的表達(dá)相關(guān)系數(shù)(pearson correlation rho), ## 2. 計算topology overlap similarity (TOM)。用TOM表示兩個基因在網(wǎng)絡(luò)結(jié)構(gòu)上的相似性,即兩個基因如果具有相似的鄰近基因,這兩個基因更傾向于有相互作用。 ###(1)網(wǎng)絡(luò)構(gòu)建 Co-expression similarity and adjacency adjacency = adjacency(datExpr, power = power) ###(2) 鄰近矩陣到拓?fù)渚仃嚨霓D(zhuǎn)換,Turn adjacency into topological overlap TOM = TOMsimilarity(adjacency) dissTOM = 1-TOM ###(3) 聚類拓?fù)渚仃?Clustering using TOM # Call the hierarchical clustering function geneTree = hclust(as.dist(dissTOM), method = "average"); # Plot the resulting clustering tree (dendrogram) ## 這個時候的geneTree與一步法的 net$dendrograms[[1]] 性質(zhì)類似,但是還需要進(jìn)行進(jìn)一步處理 plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity", labels = FALSE, hang = 0.04) ###(4) 聚類分支的修整 dynamicTreeCut ################# set the minimum module size ############################## minModuleSize = 30 #### # Module identification using dynamic tree cut: dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, deepSplit = 2, pamRespectsDendro = FALSE, minClusterSize = minModuleSize) table(dynamicMods) # Convert numeric lables into colors dynamicColors = labels2colors(dynamicMods) table(dynamicColors) # Plot the dendrogram and colors underneath plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors") ###(5) 聚類結(jié)果相似模塊的融合 Merging of modules whose expression profiles are very similar # Calculate eigengenes MEList = moduleEigengenes(datExpr, colors = dynamicColors) MEs = MEList$eigengenes # Calculate dissimilarity of module eigengenes MEDiss = 1-cor(MEs) # Cluster module eigengenes METree = hclust(as.dist(MEDiss), method = "average") #一般選擇 height cut 為0.25,對應(yīng)于有75%相關(guān)性,進(jìn)行融合 ###################### set Merging height cut ################################ MEDissThres = 0.5 #### # Plot the result plot(METree, main = "Clustering of module eigengenes", xlab = "", sub = "") # Plot the cut line into the dendrogram abline(h=MEDissThres, col = "red") # Call an automatic merging function merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3) # The merged module colors mergedColors = merge$colors # Eigengenes of the new merged modules: mergedMEs = merge$newMEs # 統(tǒng)計mergedmodule table(mergedColors) ### (6) plot the gene dendrogram pdf(file = "step3_stepbystep_DynamicTreeCut_genes-modules.pdf", width = 16,height = 12) plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05) dev.off() ### 保存數(shù)據(jù) # Rename to moduleColors moduleColors = mergedColors # Construct numerical labels corresponding to the colors colorOrder = c("grey", standardColors(50)) moduleLabels = match(moduleColors, colorOrder)-1 MEs = mergedMEs # Save module colors and labels for use in subsequent parts save(MEs, moduleLabels, moduleColors, geneTree, file = "step3_stepByStep_genes_modules.Rdata") }
step3_stepbystep_DynamicTreeCut_genes-modules.pdf
④ 關(guān)聯(lián)基因模塊與表型繪制模塊與表型的相關(guān)性熱圖、模塊與表型的相關(guān)性boxplot圖、基因與模塊、表型的相關(guān)性散點圖; 結(jié)合所得各類相關(guān)性結(jié)果,判斷出哪些模塊與表型是高度相關(guān)的,從而關(guān)聯(lián)基因模塊與表型 ####################### 4.關(guān)聯(lián)基因模塊與表型 ##################################### rm(list = ls()) load(file = "step1_input.Rdata") load(file = "step2_power_value.Rdata") load(file = "step3_genes_modules.Rdata")
## 模塊與表型的相關(guān)性熱圖 if(T){ datTraits$group <- as.factor(datTraits$group) design <- model.matrix(~0+datTraits$group) colnames(design) <- levels(datTraits$group) #get the group MES0 <- moduleEigengenes(datExpr,moduleColors)$eigengenes #Calculate module eigengenes. MEs <- orderMEs(MES0) #Put close eigenvectors next to each other moduleTraitCor <- cor(MEs,design,use = "p") moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples) textMatrix <- paste0(signif(moduleTraitCor,2),"\n(", signif(moduleTraitPvalue,1),")") dim(textMatrix) <- dim(moduleTraitCor) pdf("step4_Module-trait-relationship_heatmap.pdf", width = 2*length(colnames(design)), height = 0.6*length(names(MEs)) ) par(mar=c(5, 9, 3, 3)) #留白:下、左、上、右 labeledHeatmap(Matrix = moduleTraitCor, xLabels = colnames(design), yLabels = names(MEs), ySymbols = names(MEs), colorLabels = F, colors = blueWhiteRed(50), textMatrix = textMatrix, setStdMargins = F, cex.text = 0.5, zlim = c(-1,1), main = "Module-trait relationships") dev.off() save(design, file = "step4_design.Rdata") }
### 模塊與表型的相關(guān)性boxplot圖 if(T){ mes_group <- merge(MEs,datTraits,by="row.names") library(gplots) library(ggpubr) library(grid) library(gridExtra) draw_ggboxplot <- function(data,Module="Module",group="group"){ ggboxplot(data,x=group, y=Module, ylab = paste0(Module), xlab = group, fill = group, palette = "jco", #add="jitter", legend = "") +stat_compare_means() } # 批量畫boxplot colorNames <- names(MEs) pdf("step4_Module-trait-relationship_boxplot.pdf", width = 7.5,height = 1.6*ncol(MEs)) p <- lapply(colorNames,function(x) { draw_ggboxplot(mes_group, Module = x, group = "group") }) do.call(grid.arrange,c(p,ncol=2)) #排布為每行2個 dev.off() }
### 基因與模塊、表型的相關(guān)性散點圖 #所有的模塊都可以跟基因算出相關(guān)系數(shù),所有的連續(xù)型性狀也可以跟基因算出相關(guān)系數(shù), #如果跟性狀顯著相關(guān)的基因也跟某個模塊顯著相關(guān),那么這些基因可能就非常重要。
# 選擇離散性狀的表型 levels(datTraits$group) choose_group <- "primed"
if(T){ modNames <- substring(names(MEs), 3) ### 計算模塊與基因的相關(guān)性矩陣 ## Module Membership: 模塊內(nèi)基因表達(dá)與模塊特征值的相關(guān)性 geneModuleMembership <- as.data.frame(cor(datExpr, MEs, use = "p")) MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples)) names(geneModuleMembership) <- paste0("MM", modNames) names(MMPvalue) <- paste0("p.MM", modNames) ### 計算性狀與基因的相關(guān)性矩陣 ## Gene significance,GS:比較樣本某個基因與對應(yīng)表型的相關(guān)性 ## 連續(xù)型性狀 # trait <- datTraits$groupNo ## 非連續(xù)型性狀,需轉(zhuǎn)為0-1矩陣, 已存于design中 trait <- as.data.frame(design[,choose_group]) geneTraitSignificance <- as.data.frame(cor(datExpr,trait,use = "p")) GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples)) names(geneTraitSignificance) <- paste0("GS") names(GSPvalue) <- paste0("GS") ### 可視化基因與模塊、表型的相關(guān)性. #selectModule<-c("blue","green","purple","grey") ##可以選擇自己想要的模塊 selectModule <- modNames ## 全部模塊批量作圖 pdf("step4_gene-Module-trait-significance.pdf",width=7, height=1.5*ncol(MEs)) par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作圖開始 for(module in selectModule){ column <- match(module,selectModule) print(module) moduleGenes <- moduleColors==module verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]), abs(geneTraitSignificance[moduleGenes, 1]), xlab = paste("Module Membership in", module, "module"), ylab = "Gene significance for trait", main = paste("Module membership vs. gene significance\n"), cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module) } dev.off() }
step4_Module-trait-relationship_heatmap.pdf step4_Module-trait-relationship_boxplot.pdf step4_gene-Module-trait-significance.pdf
⑤ WGCNA可視化:TOMplot 、Eigengene-adjacency-heatmapWGCNA的標(biāo)配熱圖TOMplot / Network heapmap plot,描繪了分析中所有基因之間的拓?fù)渲丿B矩陣(TOM),顏色越深表示基因之間的相關(guān)性越大; Eigengene-adjacency-heatmap 展示基因模塊之間的相關(guān)性,還可以加入表型信息,查看表型與哪些模塊相關(guān)性高 ######################### 5. WGCNA可視化:TOMplot Eigengene-adjacency-heatmap ################################## rm(list = ls()) load(file = 'step1_input.Rdata') load(file = "step2_power_value.Rdata") load(file = "step3_genes_modules.Rdata") load(file = "step4_design.Rdata")
if(T){ TOM=TOMsimilarityFromExpr(datExpr,power=power) dissTOM=1-TOM ## draw all genes if(T){ geneTree = net$dendrograms[[1]] plotTOM = dissTOM^7 diag(plotTOM)=NA png("step5_TOMplot_Network-heatmap.png",width = 800, height=600) TOMplot(plotTOM,geneTree,moduleColors, col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'), main="Network heapmap plot") dev.off() } ### draw selected genes to save time...just for test... if(F){ nSelect =0.1*nGenes set.seed(123) select=sample(nGenes,size = nSelect) selectTOM = dissTOM[select,select] selectTree = hclust(as.dist(selectTOM),method = "average") selectColors = moduleColors[select] plotDiss=selectTOM^7 diag(plotDiss)=NA pdf("step5_select_TOMplot_Network-heatmap.pdf",width=8, height=6) TOMplot(plotDiss,selectTree,selectColors, col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'), main="Network heapmap plot of selected gene") dev.off() } }
### 模塊相關(guān)性展示 Eigengene-adjacency-heatmap if(T){ MEs = moduleEigengenes(datExpr,moduleColors)$eigengenes MET = orderMEs(MEs) # 若添加表型數(shù)據(jù) if(T){ ## 連續(xù)型性狀 # MET = orderMEs(cbind(MEs,datTraits$groupNo)) ## 非連續(xù)型性狀,需將是否屬于這個表型進(jìn)行0,1數(shù)值化,已存于design中 design primed = as.data.frame(design[,3]) names(primed) = "primed" # Add the weight to existing module eigengenes MET = orderMEs(cbind(MEs, primed)) } pdf("step5_module_cor_Eigengene-dendrogram.pdf",width = 8,height = 10) plotEigengeneNetworks(MET, setLabels="", marDendro = c(0,4,1,4), # 留白:下右上左 marHeatmap = c(5,5,1,2), # 留白:下右上左 cex.lab = 0.8, xLabelsAngle = 90) dev.off() }
step5_select_TOMplot_Network-heatmap.pdf step5_module_cor_Eigengene-dendrogram.pdf
⑥ 對感興趣模塊的基因進(jìn)行GO分析這里選取了與表型相關(guān)度較高的"black","pink","royalblue"模塊進(jìn)行GO分析 #################### 6. 選擇感興趣基因模塊進(jìn)行GO分析 #################### rm(list = ls()) load(file = 'step1_input.Rdata') load(file = "step2_power_value.Rdata") load(file = "step3_genes_modules.Rdata") load(file = "step4_design.Rdata")
### 條件設(shè)置 OrgDb = "org.Mm.eg.db" # "org.Mm.eg.db" "org.Hs.eg.db" genetype = "SYMBOL" # "SYMBOL" "ENSEMBL" table(moduleColors) choose_module <- c("black","pink","royalblue")
if(T){ library(clusterProfiler) library(org.Mm.eg.db) library(org.Hs.eg.db) gene_module <- data.frame(gene=colnames(datExpr), module=moduleColors) write.csv(gene_module,file = "step6_gene_moduleColors.csv",row.names = F, quote = F) tmp <- bitr(gene_module$gene,fromType = genetype, # "SYMBOL" "ENSEMBL" toType = "ENTREZID", OrgDb = OrgDb ) gene_module_entrz <- merge(tmp,gene_module, by.x=genetype, by.y="gene") choose_gene_module_entrz <- gene_module_entrz[gene_module_entrz$module %in% choose_module,] ###run go analysis formula_res <- compareCluster( ENTREZID~module, data = choose_gene_module_entrz, fun = "enrichGO", OrgDb = OrgDb, ont = "BP", #One of "BP", "MF", and "CC" or "ALL" pAdjustMethod = "BH", pvalueCutoff = 0.25, qvalueCutoff = 0.25 ) ###精簡GO富集的結(jié)果,去冗余 lineage1_ego <- simplify( formula_res, cutoff=0.5, by="p.adjust", select_fun=min ) save(gene_module, formula_res, lineage1_ego, file="step6_module_GO_term.Rdata") write.csv(lineage1_ego@compareClusterResult, file="step6_module_GO_term.csv") ### 繪制dotplot圖 dotp <- dotplot(lineage1_ego, showCategory=10, includeAll = TRUE, #將有overlap的結(jié)果也展示出來 label_format=90) ggsave(dotp,filename= "step6_module_GO_term.pdf", #device = cairo_pdf, width = 12, height = 15) }
step6_module_GO_term.pdf
⑦ 感興趣基因模塊繪制熱圖這里選取了與primed表型相關(guān)度很高的royalblue模塊繪制熱圖 ############################### 7.感興趣基因模塊繪制熱圖 ###################################### rm(list = ls()) load(file = 'step1_input.Rdata') load(file = "step3_genes_modules.Rdata") table(moduleColors)
module = "royalblue" ### 感興趣模塊畫熱圖 if(T){ dat=datExpr[,moduleColors==module] library(pheatmap) n=t(scale(dat)) #對基因做scale,并轉(zhuǎn)置表達(dá)矩陣為行為基因、列為樣本形式 # n[n>2]=2 # n[n< -2]= -2 # n[1:4,1:4] group_list=datTraits$group ac=data.frame(g=group_list) rownames(ac)=colnames(n) pheatmap::pheatmap(n, fontsize = 8, show_colnames =T, show_rownames = F, cluster_cols = T, annotation_col =ac, width = 8, height = 6, angle_col=45, main = paste0("module_",module,"-gene heatmap"), filename = paste0("step7_module_",module,"_Gene-heatmap.pdf")) }
step7_module_royalblue_Gene-heatmap.pdf
⑧ 導(dǎo)出基因至 VisANT 或 cytoscape將感興趣模塊 "royalblue"的基因?qū)С?VisANT或cytoscape,cytoscape作圖請參閱RNA-seq入門實戰(zhàn)(十):PPI蛋白互作網(wǎng)絡(luò)構(gòu)建(下)——Cytoscape軟件的使用 ################### 8.感興趣模塊基因?qū)С?VisANT or cytoscape ###################### rm(list = ls()) load(file = 'step1_input.Rdata') load(file = "step2_power_value.Rdata") load(file = "step3_genes_modules.Rdata") module = "royalblue" if(T){ ### 提取感興趣模塊基因名 gene <- colnames(datExpr) inModule <- moduleColors==module modgene <- gene[inModule] ### 模塊對應(yīng)的基因關(guān)系矩陣 TOM <- TOMsimilarityFromExpr(datExpr,power=power) modTOM <- TOM[inModule,inModule] dimnames(modTOM) <- list(modgene,modgene)
### 篩選連接度最大的top100基因 nTop = 100 IMConn = softConnectivity(datExpr[, modgene]) #計算連接度 top = (rank(-IMConn) <= nTop) #選取連接度最大的top100 filter_modTOM <- modTOM[top, top] # for visANT vis <- exportNetworkToVisANT(filter_modTOM, file = paste("step8_visANTinput-",module,".txt",sep = ""), weighted = T,threshold = 0) # for cytoscape cyt <- exportNetworkToCytoscape(filter_modTOM, edgeFile = paste("step8_CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""), nodeFile = paste("step8_CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""), weighted = TRUE, threshold = 0.15, #weighted權(quán)重篩選閾值,可調(diào)整 nodeNames = modgene[top], nodeAttr = moduleColors[inModule][top]) }
參考資料 WGCNA官方說明文檔:WGCNA: R package for performing Weighted Gene Co-expression Network Analysis () WGCNA文章鏈接: 算法—— A general framework for weighted gene co-expression network analysis - PubMed () R包——WGCNA: an R package for weighted correlation network analysis - PMC () 官方英文教程: GBMTutorialHorvath.doc () YeastTutorialHorvath.doc () FemaleLiver-02-networkConstr-man.pdf ()(含分步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)) 注意事項: WGCNA package: Frequently Asked Questions () 優(yōu)秀的中文教程:GEO/task4-NPC at master · jmzeng1314/GEO · GitHub 一文看懂WGCNA 分析(2019更新版)手把手10分文章WGCNA復(fù)現(xiàn):小膠質(zhì)細(xì)胞亞群在腦發(fā)育時髓鞘形成的作用 GitHub - jmzeng1314/my_WGCNA WGCNA分析,簡單全面的最新教程
|