午夜视频在线网站,日韩视频精品在线,中文字幕精品一区二区三区在线,在线播放精品,1024你懂我懂的旧版人,欧美日韩一级黄色片,一区二区三区在线观看视频

分享

RNA-seq入門實戰(zhàn)(十一):WGCNA加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析——關(guān)聯(lián)基因模塊與表型

 健明 2022-06-11 發(fā)布于廣東

連續(xù)兩次求賢令:曾經(jīng)我給你帶來了十萬用戶,但現(xiàn)在祝你倒閉,以及 生信技能樹知識整理實習(xí)生招募,讓我走大運結(jié)識了幾位優(yōu)秀小伙伴!大家開始根據(jù)我的ngs組學(xué)視頻進(jìn)行一系列公共數(shù)據(jù)集分析實戰(zhàn),其中幾個小伙伴讓我非常驚喜,不需要怎么溝通和指導(dǎo),就默默的完成了一個實戰(zhàn)!

他前面的分享是:

下面是他對我們b站轉(zhuǎn)錄組視頻課程的詳細(xì)筆記

承接上節(jié):RNA-seq入門實戰(zhàn)(四):差異分析前的準(zhǔn)備——數(shù)據(jù)檢查,以及 RNA-seq入門實戰(zhàn)(五):差異分析——DESeq2 edgeR limma的使用與比較 

我們《生信技能樹》這些年有很多關(guān)于WGCNA的實戰(zhàn)細(xì)節(jié)建議分享,見:

其實WGCNA本身是對基因進(jìn)行合理(加權(quán)共表達(dá))的分組。

  • 如果一個表達(dá)量矩陣, 里面的樣品是兩個分組,比如正常和對照,那么簡單的差異分析就可以拿到上下調(diào)基因,各自可以去富集生物學(xué)通路,就是基因分組了,并沒有太多的進(jìn)行WGCNA分析的必要性,而且絕大部分的兩個分組的表達(dá)量矩陣?yán)锩娴臉悠窋?shù)量通常是小于15個的,官方也并不推薦WGCNA分析。
  • 如果一個表達(dá)量矩陣,里面的樣品是時間序列這樣的多分組,比如處理前后以及處理過程的不同時間梯度或者不同濃度,那么我們就需要每個分組都去跟對照組進(jìn)行差異分析,上下調(diào)的組合非常多,結(jié)果也很難精煉出生物學(xué)結(jié)論,這個時候就可以選擇WGCNA或者mfuzz這樣的時間序列分析。

也就是說,只要是多分組,就涉及到多次差異分析,而且多分組意味著樣品數(shù)量肯定不少,這樣的話,在這個表達(dá)量矩陣?yán)锩?,不同基因之間可以計算合理的相關(guān)性, 就可以根據(jù)基因之間的相似性進(jìn)行基因劃分為不同的模塊了。

本節(jié)概覽

1. WGCNA基本概念定義、關(guān)鍵術(shù)語、基本流程、一些注意事項

2. WGCNA運行

?輸入數(shù)據(jù)準(zhǔn)備

①判斷數(shù)據(jù)質(zhì)量,繪制樣品的系統(tǒng)聚類樹

②挑選最佳閾值power

③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)( 一步法和分步法),識別基因模塊

④ 關(guān)聯(lián)基因模塊與表型:模塊與表型相關(guān)性熱圖、模塊與表型相關(guān)性boxplot圖、基因與模塊、表型相關(guān)性散點圖

⑤ WGCNA的標(biāo)配熱圖 ,模塊相關(guān)性展示

⑥ 對感興趣模塊的基因進(jìn)行批量GO分析

⑦ 感興趣模塊繪制熱圖

⑧ 提取感興趣模塊的基因名, 導(dǎo)出基因至 VisANT 或 cytoscape作圖

簡單來說,WGCNA其實相當(dāng)于是對多個復(fù)雜分組進(jìn)行的差異分析,用于找尋不同分組/表型的特征基因模塊,從而進(jìn)行下一步分析(如可以對模塊內(nèi)的基因進(jìn)行GO富集、PPI分析等等).其最最核心之處就在于能將基因模塊與樣本表型進(jìn)行關(guān)聯(lián)。

1. WGCNA基本概念

1.1 定義

WGCNA(Weighted Gene Co-Expression Network Analysis ),即加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析,用于尋找高度相關(guān)的基因構(gòu)成的基因模塊module,利用模塊特征基因eigengene(模塊內(nèi)第一主成分)或模塊內(nèi)的關(guān)鍵基因Hub gene來總結(jié)這些模塊,將模塊與樣本性狀進(jìn)行關(guān)聯(lián)。

1.2 其他關(guān)鍵術(shù)語

關(guān)鍵術(shù)語(Connectivity、Adjacency matrix、TOM等)見:
Simulated-00-Background.pdf ()一文看懂WGCNA 分析WGCNA分析,簡單全面的最新教程

1.3 基本流程

構(gòu)建基因共表達(dá)網(wǎng)絡(luò) >> 識別基因模塊 >> 關(guān)聯(lián)基因模塊與表型 >> 研究基因模塊間關(guān)系 >> 從感興趣的基因模塊中尋找關(guān)鍵驅(qū)動基因

1.4 一些注意事項

WGCNA package: Frequently Asked Questions ()

  • 樣本數(shù) >= 15

  • 基因過濾方法采用均值、方差、中位數(shù)、絕對中位差MAD等方法,過濾低表達(dá)或樣本間變化小的基因,但不建議用差異分析的方法進(jìn)行過濾

  • 輸入數(shù)據(jù)形式如果有批次效應(yīng),需要先進(jìn)行去除;
    處理RNAseq數(shù)據(jù),需要采用DESeq2的varianceStabilizingTransformation方法,或?qū)⒒驑?biāo)準(zhǔn)化后的數(shù)據(jù)(如FPKM、CPM等)進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化

  • 經(jīng)驗軟閾值power當(dāng)無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),計算出的power無法達(dá)到要求時(即沒有一個power值可以使無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度降到100以下),可采用經(jīng)驗power值:


2. WGCNA運行

主要參考修改自以下代碼:
GEO/task4-NPC at master · jmzeng1314/GEO · GitHub手把手10分文章WGCNA復(fù)現(xiàn):小膠質(zhì)細(xì)胞亞群在腦發(fā)育時髓鞘形成的作用

以下實戰(zhàn)數(shù)據(jù)來自于https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE154290,Supplementary file中的GSE154290_RNA.FPKM.txt.gz文件


? 輸入數(shù)據(jù)準(zhǔn)備

讀取fpkm表達(dá)矩陣,進(jìn)行l(wèi)og2(x+1)轉(zhuǎn)化;最后對矩陣進(jìn)行轉(zhuǎn)置,變成行為樣品、列為基因的形式

rm(list = ls())  options(stringsAsFactors = F)Sys.setenv(LANGUAGE = "en")library(WGCNA)library(FactoMineR)library(factoextra)  library(tidyverse) # ggplot2 stringer dplyr tidyr readr purrr  tibble forcatslibrary(data.table) #多核讀取文件dir.create('8.WGCNA')setwd('8.WGCNA')### 啟用WGCNA多核計算enableWGCNAThreads(nThreads = 0.75*parallel::detectCores()) 
################################# 0.輸入數(shù)據(jù)準(zhǔn)備 ################################### 讀取表達(dá)矩陣if (T) { fpkm00 <- fread("../GSE154290_RNA.FPKM.txt.gz",data.table = F) ### 合并具有相同基因名的行 table(duplicated(fpkm00$gene)) #統(tǒng)計重復(fù)基因名的基因 gene <- fpkm00$gene ; fpkm00 <- fpkm00[,-1] fpkm0 <- aggregate(fpkm00, by=list(gene), FUN=sum) fpkm <- column_to_rownames(fpkm0,"Group.1")}data <- log2(fpkm+1)
### 篩選MAD前5000的基因keep_data <- data[order(apply(data,1,mad), decreasing = T)[1:5000],]
### 創(chuàng)建datTraits,包含分組、表型等信息datTraits <- data.frame(row.names = colnames(data),group=colnames(data))fix(datTraits)
### 給分組加上編號grouptype <- data.frame(group=sort(unique(datTraits$group)), groupNo=1:length(unique(datTraits$group)))# fix(grouptype)datTraits$groupNo = "NA"for(i in 1:nrow(grouptype)){ datTraits[which(datTraits$group == grouptype$group[i]),'groupNo'] <- grouptype$groupNo[i]}datTraits
### 轉(zhuǎn)置datExpr0 <- as.data.frame(t(keep_data))

在這里根據(jù)細(xì)胞的多能性狀態(tài),將這些細(xì)胞粗略分為3組,并按其發(fā)育順序編號為123,添加到了其表型數(shù)據(jù)中:

datTraits


① 判斷數(shù)據(jù)質(zhì)量,繪制樣品的系統(tǒng)聚類樹

查看樣本和基因的數(shù)據(jù)質(zhì)量,去除低質(zhì)量數(shù)據(jù);繪制樣品的系統(tǒng)聚類樹,若存在一個顯著離群點則需要剔除掉;還可以做PCA圖查看樣品分布情況

############################## 1.判斷數(shù)據(jù)質(zhì)量 ################################
### 判斷數(shù)據(jù)質(zhì)量--缺失值gsg <- goodSamplesGenes(datExpr0,verbose = 3)gsg$allOKif (!gsg$allOK){ # Optionally, print the gene and sample names that were removed: if (sum(!gsg$goodGenes)>0) printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", "))); if (sum(!gsg$goodSamples)>0) printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", "))); # Remove the offending genes and samples from the data: datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]}gsg <- goodSamplesGenes(datExpr0,verbose = 3)gsg$allOK
### 繪制樣品的系統(tǒng)聚類樹if(T){ #針對樣本做聚類樹 sampleTree <- hclust(dist(datExpr0), method = "average") par(mar = c(0,5,2,0)) plot(sampleTree, main = "Sample clustering", sub="", xlab="", cex.lab = 2, cex.axis = 1, cex.main = 1,cex.lab=1) ## 若樣本有性狀、表型,可以添加對應(yīng)顏色,查看是否聚類合理 sample_colors <- numbers2colors(as.numeric(factor(datTraits$group)), colors = rainbow(length(table(datTraits$group))), signed = FALSE) ## 繪制樣品的系統(tǒng)聚類樹及對應(yīng)性狀 par(mar = c(1,4,3,1),cex=0.8) pdf("step1_Sample dendrogram and trait.pdf",width = 8,height = 6) plotDendroAndColors(sampleTree, sample_colors, groupLabels = "trait", cex.dendroLabels = 0.8, marAll = c(1, 4, 3, 1), cex.rowText = 0.01, main = "Sample dendrogram and trait" ) ## Plot a line to show the cut # abline(h = 23500, col = "red") #根據(jù)實際情況而定 dev.off()}
##若存在顯著離群點;剔除掉if(F){ clust <- cutreeStatic(sampleTree, cutHeight = 23500, minSize = 10) # cutHeight根據(jù)實際情況而定 table(clust) keepSamples <- (clust==1) datExpr0 <- datExpr0[keepSamples, ] datTraits <- datTraits[keepSamples,] dim(datExpr0) }
### 判斷數(shù)據(jù)質(zhì)量 : PCA進(jìn)行分組查看rm(list = ls()) load("step1_input.Rdata")group_list <- datTraits$groupdat.pca <- PCA(datExpr, graph = F) pca <- fviz_pca_ind(dat.pca, title = "Principal Component Analysis", legend.title = "Groups", geom.ind = c("point","text"), #"point","text" pointsize = 2, labelsize = 4, repel = TRUE, #標(biāo)簽不重疊 col.ind = group_list, # 分組上色 axes.linetype=NA, # remove axeslines mean.point=F#去除分組中心點 ) + theme(legend.position = "none")+ # "none" REMOVE legend coord_fixed(ratio = 1) #坐標(biāo)軸的縱橫比pcaggsave(pca,filename= "step1_Sample PCA analysis.pdf", width = 8, height = 8)
##保存數(shù)據(jù)datExpr <- datExpr0nGenes <- ncol(datExpr)nSamples <- nrow(datExpr)save(nGenes,nSamples,datExpr,datTraits,file="step1_input.Rdata")

下圖可看出此次聚類效果還是比較好的,沒有離群值,因此不用進(jìn)行剔除樣本

step1_Sample dendrogram and trait.pdf

step1_Sample PCA analysis.pdf


② 挑選最佳閾值power

挑選標(biāo)準(zhǔn):R^2 > 0.8 , slope ≈ -1,選擇R^2 與 power關(guān)系圖的拐點處的power值

https://horvath.genetics./html/GeneralFramework/YeastTutorialHorvath.pdf

sft$powerEstimate可以給出所設(shè)定R^2 cut-off(默認(rèn)為0.85)處的power值

############################### 2.挑選最佳閾值power ###################################rm(list = ls())  load("step1_input.Rdata")R.sq_cutoff = 0.8  #設(shè)置R^2 cut-off,默認(rèn)為0.85if(T){  # Call the network topology analysis function  #設(shè)置power參數(shù)選擇范圍  powers <- c(seq(1,20,by = 1), seq(22,30,by = 2))   sft <- pickSoftThreshold(datExpr,                            networkType = "unsigned",                           powerVector = powers,                            RsquaredCut = R.sq_cutoff,                             verbose = 5)  #SFT.R.sq > 0.8 , slope ≈ -1  pdf("step2_power-value.pdf",width = 16,height = 12)  # Plot the results: 尋找拐點,確認(rèn)最終power取值  par(mfrow = c(1,2));  cex1 = 0.9;  # Scale-free topology fit index as a function of the soft-thresholding power  plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],       xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n")  text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],       labels=powers,cex=cex1,col="red")  # this line corresponds to using an R^2 cut-off of h  abline(h=R.sq_cutoff ,col="red")  # Mean connectivity as a function of the soft-thresholding power  plot(sft$fitIndices[,1], sft$fitIndices[,5],       xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n")  text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")  abline(h=100,col="red")  dev.off()}
sft$powerEstimate #查看估計的最佳power# power = sft$powerEstimatepower = 15
# 若無向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),沒有一個power值使# 無標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8且平均連接度在100以下,可能是由于# 部分樣品與其他樣品差別太大。這可能由批次效應(yīng)、樣品異質(zhì)性或?qū)嶒灄l件對# 表達(dá)影響太大等造成??梢酝ㄟ^繪制樣品聚類查看分組信息和有無異常樣品。# 如果這確實是由有意義的生物變化引起的,也可以使用下面的經(jīng)驗power值。if(is.na(power)){ # 官方推薦 "signed" 或 "signed hybrid" # 為與原文檔一致,故未修改 type = "unsigned" nSamples=nrow(datExpr) power = ifelse(nSamples<20, ifelse(type == "unsigned", 9, 18), ifelse(nSamples<30, ifelse(type == "unsigned", 8, 16), ifelse(nSamples<40, ifelse(type == "unsigned", 7, 14), ifelse(type == "unsigned", 6, 12)) ) )}
save(sft, power, file = "step2_power_value.Rdata")

從下圖可看出拐點大致在power為16時出現(xiàn),且各項數(shù)值基本滿足挑選標(biāo)準(zhǔn),因此設(shè)定power=16

step2_power-value.pdf


③ 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊

構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)可選擇一步法(one step) 分步法(step by step)進(jìn)行。
一般情況下優(yōu)先選擇采用簡便的一步法,當(dāng)想要調(diào)整得到的基因模塊數(shù)目時,采用分步法更靈活

  • 一步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊
    主要調(diào)整參數(shù)為minModuleSizemergeCutHeight , 數(shù)值越大模塊越少

##################### 3.一步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊 ####################load(file = "step1_input.Rdata")load(file = "step2_power_value.Rdata")if(T){  net <- blockwiseModules(    datExpr,    power = power,    maxBlockSize = ncol(datExpr),    corType = "pearson", #默認(rèn)為"pearson","bicor"則更能考慮離群點的影響    networkType = "unsigned",    TOMType = "unsigned",     minModuleSize = 30,    ##越大模塊越少    mergeCutHeight = 0.25, ##越大模塊越少    numericLabels = TRUE,     saveTOMs = F,    verbose = 3  )  table(net$colors)   # power: 上一步計算的軟閾值  # maxBlockSize:計算機能處理的最大模塊的基因數(shù)量(默認(rèn)5000),16G內(nèi)存可以處理2萬個,  # 計算資源允許的情況下最好放在一個block里面。  # corType:計算相關(guān)性的方法;可選pearson(默認(rèn)),bicor。后者更能考慮離群點的影響。  # networkType:計算鄰接矩陣時,是否考慮正負(fù)相關(guān)性;默認(rèn)為"unsigned",可選"signed", "signed hybrid"  # TOMType:計算TOM矩陣時,是否考慮正負(fù)相關(guān)性;默認(rèn)為"signed",可選"unsigned"。但是根據(jù)冪律轉(zhuǎn)換的鄰接矩陣(權(quán)重)的非負(fù)性,所以認(rèn)為這里選擇"signed"也沒有太多的意義。  # numericLabels: 返回數(shù)字而不是顏色作為模塊的名字,后面可以再轉(zhuǎn)換為顏色  # saveTOMs:最耗費時間的計算,可存儲起來供后續(xù)使用,  # mergeCutHeight: 合并模塊的閾值,越大模塊越少,一般為0.25  # minModuleSize: 每個模塊里最少放多少個基因,設(shè)定越大模塊越少  # 輸出結(jié)果根據(jù)模塊中基因數(shù)目的多少,降序排列,依次編號為 `1-最大模塊數(shù)`。  # **0 (grey)**表示**未**分入任何模塊的基因。}
## 模塊可視化,層級聚類樹展示各個模塊if(T){ # Convert labels to colors for plotting moduleColors <- labels2colors(net$colors) table(moduleColors) # Plot the dendrogram and the module colors underneath pdf("step3_genes-modules_ClusterDendrogram.pdf",width = 16,height = 12) plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]], "Module colors", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05) dev.off()}save(net, moduleColors, file = "step3_genes_modules.Rdata")

step3_genes-modules_ClusterDendrogram.pdf

  • 分步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò),識別基因模塊
    與一步法類似,主要調(diào)整參數(shù)為minModuleSizeMEDissThres , 數(shù)值越大模塊越少

#####################  分布法完成網(wǎng)絡(luò)構(gòu)建,一般不用 #################################if(F){  ## 構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò)分為兩步:  ## 1. 計算鄰近值,也是就是兩個基因在不同樣品中表達(dá)量的表達(dá)相關(guān)系數(shù)(pearson correlation rho),  ## 2. 計算topology overlap similarity (TOM)。用TOM表示兩個基因在網(wǎng)絡(luò)結(jié)構(gòu)上的相似性,即兩個基因如果具有相似的鄰近基因,這兩個基因更傾向于有相互作用。    ###(1)網(wǎng)絡(luò)構(gòu)建 Co-expression similarity and adjacency   adjacency = adjacency(datExpr, power = power)     ###(2) 鄰近矩陣到拓?fù)渚仃嚨霓D(zhuǎn)換,Turn adjacency into topological overlap  TOM = TOMsimilarity(adjacency)  dissTOM = 1-TOM    ###(3) 聚類拓?fù)渚仃?Clustering using TOM  # Call the hierarchical clustering function  geneTree = hclust(as.dist(dissTOM), method = "average");  # Plot the resulting clustering tree (dendrogram)  ## 這個時候的geneTree與一步法的 net$dendrograms[[1]] 性質(zhì)類似,但是還需要進(jìn)行進(jìn)一步處理  plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",       labels = FALSE, hang = 0.04)    ###(4) 聚類分支的修整 dynamicTreeCut   ################# set the minimum module size ##############################  minModuleSize = 30  ####  # Module identification using dynamic tree cut:  dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,                              deepSplit = 2, pamRespectsDendro = FALSE,                              minClusterSize = minModuleSize)  table(dynamicMods)  # Convert numeric lables into colors  dynamicColors = labels2colors(dynamicMods)  table(dynamicColors)  # Plot the dendrogram and colors underneath  plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",                      dendroLabels = FALSE, hang = 0.03,                      addGuide = TRUE, guideHang = 0.05,                      main = "Gene dendrogram and module colors")    ###(5) 聚類結(jié)果相似模塊的融合 Merging of modules whose expression profiles are very similar  # Calculate eigengenes  MEList = moduleEigengenes(datExpr, colors = dynamicColors)  MEs = MEList$eigengenes  # Calculate dissimilarity of module eigengenes  MEDiss = 1-cor(MEs)  # Cluster module eigengenes  METree = hclust(as.dist(MEDiss), method = "average")  #一般選擇 height cut 為0.25,對應(yīng)于有75%相關(guān)性,進(jìn)行融合  ###################### set  Merging height cut  ################################  MEDissThres = 0.5  ####  # Plot the result  plot(METree, main = "Clustering of module eigengenes",       xlab = "", sub = "")  # Plot the cut line into the dendrogram  abline(h=MEDissThres, col = "red")  # Call an automatic merging function  merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)  # The merged module colors  mergedColors = merge$colors  # Eigengenes of the new merged modules:  mergedMEs = merge$newMEs  # 統(tǒng)計mergedmodule  table(mergedColors)    ### (6) plot the gene dendrogram   pdf(file = "step3_stepbystep_DynamicTreeCut_genes-modules.pdf", width = 16,height = 12)  plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),                      c("Dynamic Tree Cut", "Merged dynamic"),                      dendroLabels = FALSE, hang = 0.03,                      addGuide = TRUE, guideHang = 0.05)  dev.off()    ### 保存數(shù)據(jù)  # Rename to moduleColors  moduleColors = mergedColors  # Construct numerical labels corresponding to the colors  colorOrder = c("grey", standardColors(50))  moduleLabels = match(moduleColors, colorOrder)-1  MEs = mergedMEs  # Save module colors and labels for use in subsequent parts  save(MEs, moduleLabels, moduleColors, geneTree,        file = "step3_stepByStep_genes_modules.Rdata")  }

step3_stepbystep_DynamicTreeCut_genes-modules.pdf


④ 關(guān)聯(lián)基因模塊與表型

繪制模塊與表型的相關(guān)性熱圖、模塊與表型的相關(guān)性boxplot圖、基因與模塊、表型的相關(guān)性散點圖;
結(jié)合所得各類相關(guān)性結(jié)果,判斷出哪些模塊與表型是高度相關(guān)的,從而關(guān)聯(lián)基因模塊與表型

####################### 4.關(guān)聯(lián)基因模塊與表型 #####################################rm(list = ls())  load(file = "step1_input.Rdata")load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")
## 模塊與表型的相關(guān)性熱圖if(T){ datTraits$group <- as.factor(datTraits$group) design <- model.matrix(~0+datTraits$group) colnames(design) <- levels(datTraits$group) #get the group MES0 <- moduleEigengenes(datExpr,moduleColors)$eigengenes #Calculate module eigengenes. MEs <- orderMEs(MES0) #Put close eigenvectors next to each other moduleTraitCor <- cor(MEs,design,use = "p") moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples) textMatrix <- paste0(signif(moduleTraitCor,2),"\n(", signif(moduleTraitPvalue,1),")") dim(textMatrix) <- dim(moduleTraitCor) pdf("step4_Module-trait-relationship_heatmap.pdf", width = 2*length(colnames(design)), height = 0.6*length(names(MEs)) ) par(mar=c(5, 9, 3, 3)) #留白:下、左、上、右 labeledHeatmap(Matrix = moduleTraitCor, xLabels = colnames(design), yLabels = names(MEs), ySymbols = names(MEs), colorLabels = F, colors = blueWhiteRed(50), textMatrix = textMatrix, setStdMargins = F, cex.text = 0.5, zlim = c(-1,1), main = "Module-trait relationships") dev.off() save(design, file = "step4_design.Rdata")}

### 模塊與表型的相關(guān)性boxplot圖 if(T){mes_group <- merge(MEs,datTraits,by="row.names") library(gplots)library(ggpubr)library(grid)library(gridExtra) draw_ggboxplot <- function(data,Module="Module",group="group"){ ggboxplot(data,x=group, y=Module, ylab = paste0(Module), xlab = group, fill = group, palette = "jco", #add="jitter", legend = "") +stat_compare_means()}# 批量畫boxplotcolorNames <- names(MEs)pdf("step4_Module-trait-relationship_boxplot.pdf", width = 7.5,height = 1.6*ncol(MEs))p <- lapply(colorNames,function(x) { draw_ggboxplot(mes_group, Module = x, group = "group")})do.call(grid.arrange,c(p,ncol=2)) #排布為每行2個dev.off()}

### 基因與模塊、表型的相關(guān)性散點圖#所有的模塊都可以跟基因算出相關(guān)系數(shù),所有的連續(xù)型性狀也可以跟基因算出相關(guān)系數(shù), #如果跟性狀顯著相關(guān)的基因也跟某個模塊顯著相關(guān),那么這些基因可能就非常重要。
# 選擇離散性狀的表型levels(datTraits$group)choose_group <- "primed"
if(T){ modNames <- substring(names(MEs), 3) ### 計算模塊與基因的相關(guān)性矩陣 ## Module Membership: 模塊內(nèi)基因表達(dá)與模塊特征值的相關(guān)性 geneModuleMembership <- as.data.frame(cor(datExpr, MEs, use = "p")) MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples)) names(geneModuleMembership) <- paste0("MM", modNames) names(MMPvalue) <- paste0("p.MM", modNames) ### 計算性狀與基因的相關(guān)性矩陣 ## Gene significance,GS:比較樣本某個基因與對應(yīng)表型的相關(guān)性 ## 連續(xù)型性狀 # trait <- datTraits$groupNo ## 非連續(xù)型性狀,需轉(zhuǎn)為0-1矩陣, 已存于design中 trait <- as.data.frame(design[,choose_group]) geneTraitSignificance <- as.data.frame(cor(datExpr,trait,use = "p")) GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples)) names(geneTraitSignificance) <- paste0("GS") names(GSPvalue) <- paste0("GS") ### 可視化基因與模塊、表型的相關(guān)性. #selectModule<-c("blue","green","purple","grey") ##可以選擇自己想要的模塊 selectModule <- modNames ## 全部模塊批量作圖 pdf("step4_gene-Module-trait-significance.pdf",width=7, height=1.5*ncol(MEs)) par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作圖開始 for(module in selectModule){ column <- match(module,selectModule) print(module) moduleGenes <- moduleColors==module verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]), abs(geneTraitSignificance[moduleGenes, 1]), xlab = paste("Module Membership in", module, "module"), ylab = "Gene significance for trait", main = paste("Module membership vs. gene significance\n"), cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module) } dev.off()}

step4_Module-trait-relationship_heatmap.pdf

step4_Module-trait-relationship_boxplot.pdf

step4_gene-Module-trait-significance.pdf


⑤ WGCNA可視化:TOMplot 、Eigengene-adjacency-heatmap

WGCNA的標(biāo)配熱圖TOMplot / Network heapmap plot,描繪了分析中所有基因之間的拓?fù)渲丿B矩陣(TOM),顏色越深表示基因之間的相關(guān)性越大;
Eigengene-adjacency-heatmap 展示基因模塊之間的相關(guān)性,還可以加入表型信息,查看表型與哪些模塊相關(guān)性高

#########################  5. WGCNA可視化:TOMplot  Eigengene-adjacency-heatmap ##################################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")load(file = "step4_design.Rdata")
if(T){ TOM=TOMsimilarityFromExpr(datExpr,power=power) dissTOM=1-TOM ## draw all genes if(T){ geneTree = net$dendrograms[[1]] plotTOM = dissTOM^7 diag(plotTOM)=NA png("step5_TOMplot_Network-heatmap.png",width = 800, height=600) TOMplot(plotTOM,geneTree,moduleColors, col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'), main="Network heapmap plot") dev.off() } ### draw selected genes to save time...just for test... if(F){ nSelect =0.1*nGenes set.seed(123) select=sample(nGenes,size = nSelect) selectTOM = dissTOM[select,select] selectTree = hclust(as.dist(selectTOM),method = "average") selectColors = moduleColors[select] plotDiss=selectTOM^7 diag(plotDiss)=NA pdf("step5_select_TOMplot_Network-heatmap.pdf",width=8, height=6) TOMplot(plotDiss,selectTree,selectColors, col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'), main="Network heapmap plot of selected gene") dev.off() }}

### 模塊相關(guān)性展示 Eigengene-adjacency-heatmapif(T){ MEs = moduleEigengenes(datExpr,moduleColors)$eigengenes MET = orderMEs(MEs) # 若添加表型數(shù)據(jù) if(T){ ## 連續(xù)型性狀 # MET = orderMEs(cbind(MEs,datTraits$groupNo)) ## 非連續(xù)型性狀,需將是否屬于這個表型進(jìn)行0,1數(shù)值化,已存于design中 design primed = as.data.frame(design[,3]) names(primed) = "primed" # Add the weight to existing module eigengenes MET = orderMEs(cbind(MEs, primed)) } pdf("step5_module_cor_Eigengene-dendrogram.pdf",width = 8,height = 10) plotEigengeneNetworks(MET, setLabels="", marDendro = c(0,4,1,4), # 留白:下右上左 marHeatmap = c(5,5,1,2), # 留白:下右上左 cex.lab = 0.8, xLabelsAngle = 90) dev.off()}

step5_select_TOMplot_Network-heatmap.pdf

step5_module_cor_Eigengene-dendrogram.pdf


⑥ 對感興趣模塊的基因進(jìn)行GO分析

這里選取了與表型相關(guān)度較高的"black","pink","royalblue"模塊進(jìn)行GO分析

#################### 6. 選擇感興趣基因模塊進(jìn)行GO分析 ####################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")load(file = "step4_design.Rdata")
### 條件設(shè)置OrgDb = "org.Mm.eg.db" # "org.Mm.eg.db" "org.Hs.eg.db"genetype = "SYMBOL" # "SYMBOL" "ENSEMBL"table(moduleColors)choose_module <- c("black","pink","royalblue")
if(T){ library(clusterProfiler) library(org.Mm.eg.db) library(org.Hs.eg.db) gene_module <- data.frame(gene=colnames(datExpr), module=moduleColors) write.csv(gene_module,file = "step6_gene_moduleColors.csv",row.names = F, quote = F) tmp <- bitr(gene_module$gene,fromType = genetype, # "SYMBOL" "ENSEMBL" toType = "ENTREZID", OrgDb = OrgDb ) gene_module_entrz <- merge(tmp,gene_module, by.x=genetype, by.y="gene") choose_gene_module_entrz <- gene_module_entrz[gene_module_entrz$module %in% choose_module,] ###run go analysis formula_res <- compareCluster( ENTREZID~module, data = choose_gene_module_entrz, fun = "enrichGO", OrgDb = OrgDb, ont = "BP", #One of "BP", "MF", and "CC" or "ALL" pAdjustMethod = "BH", pvalueCutoff = 0.25, qvalueCutoff = 0.25 ) ###精簡GO富集的結(jié)果,去冗余 lineage1_ego <- simplify( formula_res, cutoff=0.5, by="p.adjust", select_fun=min ) save(gene_module, formula_res, lineage1_ego, file="step6_module_GO_term.Rdata") write.csv(lineage1_ego@compareClusterResult, file="step6_module_GO_term.csv") ### 繪制dotplot圖 dotp <- dotplot(lineage1_ego, showCategory=10, includeAll = TRUE, #將有overlap的結(jié)果也展示出來 label_format=90) ggsave(dotp,filename= "step6_module_GO_term.pdf", #device = cairo_pdf, width = 12, height = 15)}

step6_module_GO_term.pdf


⑦ 感興趣基因模塊繪制熱圖

這里選取了與primed表型相關(guān)度很高的royalblue模塊繪制熱圖

############################### 7.感興趣基因模塊繪制熱圖 ######################################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step3_genes_modules.Rdata")table(moduleColors)
module = "royalblue"### 感興趣模塊畫熱圖 if(T){ dat=datExpr[,moduleColors==module] library(pheatmap) n=t(scale(dat)) #對基因做scale,并轉(zhuǎn)置表達(dá)矩陣為行為基因、列為樣本形式 # n[n>2]=2 # n[n< -2]= -2 # n[1:4,1:4] group_list=datTraits$group ac=data.frame(g=group_list) rownames(ac)=colnames(n) pheatmap::pheatmap(n, fontsize = 8, show_colnames =T, show_rownames = F, cluster_cols = T, annotation_col =ac, width = 8, height = 6, angle_col=45, main = paste0("module_",module,"-gene heatmap"), filename = paste0("step7_module_",module,"_Gene-heatmap.pdf")) }

step7_module_royalblue_Gene-heatmap.pdf


⑧ 導(dǎo)出基因至 VisANT 或 cytoscape

將感興趣模塊 "royalblue"的基因?qū)С?VisANT或cytoscape,cytoscape作圖請參閱RNA-seq入門實戰(zhàn)(十):PPI蛋白互作網(wǎng)絡(luò)構(gòu)建(下)——Cytoscape軟件的使用

################### 8.感興趣模塊基因?qū)С?VisANT or cytoscape ######################rm(list = ls())  load(file = 'step1_input.Rdata')load(file = "step2_power_value.Rdata")load(file = "step3_genes_modules.Rdata")module = "royalblue"if(T){  ### 提取感興趣模塊基因名  gene <- colnames(datExpr)   inModule <- moduleColors==module  modgene <- gene[inModule]    ### 模塊對應(yīng)的基因關(guān)系矩陣  TOM <- TOMsimilarityFromExpr(datExpr,power=power)  modTOM <- TOM[inModule,inModule]  dimnames(modTOM) <- list(modgene,modgene)
### 篩選連接度最大的top100基因 nTop = 100 IMConn = softConnectivity(datExpr[, modgene]) #計算連接度 top = (rank(-IMConn) <= nTop) #選取連接度最大的top100 filter_modTOM <- modTOM[top, top] # for visANT vis <- exportNetworkToVisANT(filter_modTOM, file = paste("step8_visANTinput-",module,".txt",sep = ""), weighted = T,threshold = 0) # for cytoscape cyt <- exportNetworkToCytoscape(filter_modTOM, edgeFile = paste("step8_CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""), nodeFile = paste("step8_CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""), weighted = TRUE, threshold = 0.15, #weighted權(quán)重篩選閾值,可調(diào)整 nodeNames = modgene[top], nodeAttr = moduleColors[inModule][top])}
  • 將文件導(dǎo)入cytoscape后,可作圖如下:network connection in royalblue module


參考資料

WGCNA官方說明文檔:WGCNA: R package for performing Weighted Gene Co-expression Network Analysis ()

WGCNA文章鏈接:

算法—— A general framework for weighted gene co-expression network analysis - PubMed ()

R包——WGCNA: an R package for weighted correlation network analysis - PMC ()

官方英文教程:

GBMTutorialHorvath.doc ()

YeastTutorialHorvath.doc ()

FemaleLiver-02-networkConstr-man.pdf ()(含分步法構(gòu)建加權(quán)共表達(dá)網(wǎng)絡(luò))

注意事項:

WGCNA package: Frequently Asked Questions ()

優(yōu)秀的中文教程:GEO/task4-NPC at master · jmzeng1314/GEO · GitHub

一文看懂WGCNA 分析(2019更新版)手把手10分文章WGCNA復(fù)現(xiàn):小膠質(zhì)細(xì)胞亞群在腦發(fā)育時髓鞘形成的作用

GitHub - jmzeng1314/my_WGCNA

WGCNA分析,簡單全面的最新教程

    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多