【原】WGCNA實(shí)戰(zhàn)練習(xí)

生信修煉手冊 2019-12-24

展開全文

本文采用WGCNA官網(wǎng)的Tutirial 1的數(shù)據(jù)，對(duì)加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析和后續(xù)的數(shù)據(jù)挖掘的具體操作進(jìn)行梳理，數(shù)據(jù)可以從官網(wǎng)下載，示意圖如下

整個(gè)分析流程可以分為以下幾個(gè)步驟

1. 數(shù)據(jù)預(yù)處理

這部分包括以下4個(gè)內(nèi)容

讀取基因表達(dá)量數(shù)據(jù)
對(duì)樣本和基因進(jìn)行過濾
讀取樣本表型數(shù)據(jù)
可視化樣本聚類樹和表型數(shù)據(jù)

官方的示例數(shù)據(jù)是一個(gè)小鼠的芯片表達(dá)譜數(shù)據(jù)，包含了135個(gè)雌性小鼠的數(shù)據(jù)，在提供的表達(dá)譜數(shù)據(jù)中，除了探針I(yè)D和樣本表達(dá)量之外，還有額外的探針注釋信息，在讀取原始數(shù)據(jù)時(shí)，需要把多余注釋信息去除，代碼如下

# 讀取文件
options(stringsAsFactors = FALSE)
femData = read.csv("LiverFemale3600.csv")

# 去除多余的注釋信息列
datExpr0 = as.data.frame(t(femData[, -c(1:8)]))
names(datExpr0) = femData$substanceBXH
rownames(datExpr0) = names(femData)[-c(1:8)]

對(duì)于基因的表達(dá)量數(shù)據(jù)，需要進(jìn)行過濾，對(duì)于基因而言，可以過濾缺失值或者低表達(dá)的基因，對(duì)于樣本而言，如果該樣本中基因缺失值很多，也需要過濾，WGCNA內(nèi)置了一個(gè)檢驗(yàn)基因和樣本的函數(shù)，通過該函數(shù)可以進(jìn)行一個(gè)基本過濾，代碼如下

gsg = goodSamplesGenes(datExpr0)
if (!gsg$allOK) {
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}

goodSamples和goodGenes就是需要保留的基因和樣本?；A(chǔ)過濾之后，還可以看下是否存在離群值的樣本，通過樣本的聚類樹進(jìn)行判斷，代碼如下

plot(sampleTree, 
  main = "Sample clustering to detect outliers", 
  sub="", xlab="", cex.lab = 1.5,
  cex.axis = 1.5, cex.main = 2)

生成的圖片如下

從圖上可以看出，F(xiàn)2_221 這個(gè)樣本和其他樣本差距很大，可以將該樣本過濾掉。代碼如下

clust = cutreeStatic(
  sampleTree,
  cutHeight = 15,
  minSize = 10)
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)

表型數(shù)據(jù)中也包含了不需要的列，而且其樣本比表達(dá)譜的樣本多，需要根據(jù)表達(dá)譜的樣本提取對(duì)應(yīng)的表型數(shù)據(jù)，代碼如下

# 讀取文件
traitData = read.csv("ClinicalTraits.csv")

# 刪除多余的列
allTraits = traitData[, -c(31, 16)]
allTraits = allTraits[, c(2, 11:36) ]

# 樣本和表達(dá)譜的樣本保持一致
femaleSamples = rownames(datExpr)
traitRows = match(femaleSamples, allTraits$Mice)
datTraits = allTraits[traitRows, -1]
rownames(datTraits) = allTraits[traitRows, 1]

表達(dá)譜數(shù)據(jù)和表型數(shù)據(jù)準(zhǔn)備好之后，可以繪制樣本聚類樹和表型的熱圖，代碼如下

# 由于去除了樣本，重新對(duì)剩余樣本聚類
sampleTree2 = hclust(dist(datExpr), method = "average")
traitColors = numbers2colors(datTraits, signed = FALSE)
plotDendroAndColors(
  sampleTree2,
  traitColors,
  groupLabels = names(datTraits),
  main = "Sample dendrogram and trait heatmap")

生成的圖片如下

上半部分為樣本的聚類樹，下班部分為樣本對(duì)應(yīng)的表型的熱圖，順序和聚類樹中的順序一致，表達(dá)量從低到高，顏色從白色過渡到紅色，灰色代表缺失值。

2. 構(gòu)建共表達(dá)網(wǎng)絡(luò)，識(shí)別modules

在構(gòu)建共表達(dá)網(wǎng)絡(luò)時(shí)，將基因間的相關(guān)系數(shù)進(jìn)行乘方運(yùn)算來表征其相關(guān)性，首先需要確定乘方的值，代碼如下

# 設(shè)定一些列power梯度
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)

在sft這個(gè)對(duì)象中保存了每個(gè)power值計(jì)算出來的網(wǎng)絡(luò)的特征，結(jié)構(gòu)如下

> str(sft)
List of 2
$ powerEstimate: num 6
$ fitIndices   :'data.frame':    15 obs. of  7 variables:
  ..$ Power         : num [1:15] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ SFT.R.sq      : num [1:15] 0.0278 0.1264 0.3404 0.5062 0.6807 ...
  ..$ slope         : num [1:15] 0.345 -0.597 -1.03 -1.422 -1.716 ...
  ..$ truncated.R.sq: num [1:15] 0.456 0.843 0.972 0.973 0.94 ...
  ..$ mean.k.       : num [1:15] 747 254.5 111 56.5 32.2 ...
  ..$ median.k.     : num [1:15] 761.7 250.8 101.7 47.2 25.1 ...
  ..$ max.k.        : num [1:15] 1206 574 324 202 134 ...

其中powerEstimate就是最佳的power值，fitIndices保存了每個(gè)power對(duì)應(yīng)的網(wǎng)絡(luò)的特征。我們可以對(duì)不同power取值下，網(wǎng)絡(luò)的特征進(jìn)行可視化，代碼如下

plot(
  sft$fitIndices[,1],
  -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
  xlab="Soft Threshold (power)",
  ylab="Scale Free Topology Model Fit,signed R^2",type="n",
  main = paste("Scale independence")
)
text(
  sft$fitIndices[,1],
  -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
  labels=powers,
  cex=0.9,
  col="red"
)
abline(h=0.90, col="red")

生成的圖片如下

sft$fitIndices保存了每個(gè)power構(gòu)建的相關(guān)性網(wǎng)絡(luò)中的連接度的統(tǒng)計(jì)值，k就是連接度值，可以看到，對(duì)于每個(gè)power值，提供了max, median, max3種連接度的統(tǒng)計(jì)量，這里對(duì)連接度的均值進(jìn)行可視化，代碼如下

plot(
sft$fitIndices[,1],
sft$fitIndices[,5],
xlab="Soft Threshold (power)",
ylab="Mean Connectivity",
type="n",
main = paste("Mean connectivity")
)
text(
sft$fitIndices[,1],
sft$fitIndices[,5],
labels=powers,
cex=cex1,
col="red"
)

生成的圖片如下

確定好power值之后，可以直接構(gòu)建相關(guān)性網(wǎng)絡(luò)

net = blockwiseModules(
datExpr,
power = sft$powerEstimate,
TOMType = "unsigned",
minModuleSize = 30,
reassignThreshold = 0,
mergeCutHeight = 0.25,
numericLabels = TRUE,
pamRespectsDendro = FALSE,
saveTOMs = TRUE,
saveTOMFileBase = "femaleMouseTOM",
verbose = 3)

net對(duì)象中保存了所有相關(guān)性網(wǎng)絡(luò)和module的結(jié)果，可以將基因的聚類樹和對(duì)應(yīng)的module進(jìn)行可視化，代碼如下

mergedColors = labels2colors(net$colors)
plotDendroAndColors(
  net$dendrograms[[1]],
  mergedColors[net$blockGenes[[1]]],
  "Module colors",
  dendroLabels = FALSE,
  hang = 0.03,
  addGuide = TRUE,
  guideHang = 0.05
)

生成的圖片如下

上方為基因的聚類樹，聚類時(shí)的距離為1-TOM值，下方為基因?qū)?yīng)的modules。類似的，還可以結(jié)合基因間的距離，即1-TOM值，用熱圖展示，代碼如下

geneTree = net$dendrograms[[1]]
moduleColors = labels2colors(net$colors)
dissTOM = 1 - TOMsimilarityFromExpr(
  datExpr,
  power = sft$powerEstimate)
plotTOM = dissTOM ^ 7
diag(plotTOM) = NA
TOMplot(
  plotTOM,
  geneTree,
  moduleColors,
  main = "Network heatmap plot, all genes"
)

生成的圖片如下

在前面文章我們提到過，在識(shí)別module的過程中共，首先用dynamicTreeCut識(shí)別modules, 然后根據(jù)Module eigengene間的相關(guān)性合并modules,net`這個(gè)對(duì)象中保存了合并前和合并后的modules, 可以將二者畫在同一張圖上，可視化代碼如下

unmergedColors = labels2colors(net$unmergedColors)
mergedColors   = labels2colors(net$colors)
plotDendroAndColors(
  net$dendrograms[[1]],
  cbind(unmergedColors[net$blockGenes[[1]]], mergedColors[net$blockGenes[[1]]]),
  c("Dynamic Tree Cut" , "Merged colors"),
  dendroLabels = FALSE,
  hang = 0.03,
  addGuide = TRUE,
  guideHang = 0.05
)

生成的圖片如下

對(duì)于合并前的modules, 其相關(guān)性分析的結(jié)果可視化如下

unmergedColors = labels2colors(net$unmergedColors)
MEList = moduleEigengenes(datExpr, colors = unmergedColors)
MEs = MEList$eigengenes
MEDiss = 1-cor(MEs)
METree = hclust(as.dist(MEDiss), method = "average")
plot(METree,
     main = "Clustering of module eigengenes",
     xlab = "",
     sub = "")

生成的圖片如下

對(duì)于每個(gè)module而言，我們希望知道該module下對(duì)應(yīng)的基因，提取方式如下

> moduleColors = labels2colors(net$colors)
> unique(moduleColors)
[1] "grey"         "turquoise"    "grey60"       "yellow"       "tan"         
[6] "green"        "red"          "black"        "blue"         "midnightblue"
[11] "cyan"         "magenta"      "salmon"       "lightgreen"   "brown"       
[16] "purple"       "pink"         "greenyellow"  "lightcyan"
> head(names(datExpr)[moduleColors=="red"])
[1] "MMT00000159" "MMT00000793" "MMT00000840" "MMT00001154" "MMT00001245"
[6] "MMT00001260"

同樣我們也可以提取module對(duì)應(yīng)的基因表達(dá)量數(shù)據(jù)，繪制熱圖, 代碼如下

which.module="red"
plotMat(
t(scale(datExpr[,moduleColors==which.module ]) ),
nrgcols=30,
rlabels=F,
rcols=which.module,
main=which.module,
cex.main=2
)

生成的圖片如下

3. 篩選與表型相關(guān)的modules

本質(zhì)上是計(jì)算module的ME值與表型的相關(guān)系數(shù)，代碼如下

nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(
MEs,
datTraits,
use = "p"
)
moduleTraitPvalue = corPvalueStudent(
moduleTraitCor,
nSamples
)

可以對(duì)module和表型間的系數(shù)的結(jié)果進(jìn)行可視化，代碼如下

textMatrix =  paste(
signif(moduleTraitCor, 2),
"\n(",
signif(moduleTraitPvalue, 1),
")",
sep = ""
)
dim(textMatrix) = dim(moduleTraitCor)
labeledHeatmap(
Matrix = moduleTraitCor,
xLabels = names(datTraits),
yLabels = names(MEs),
ySymbols = names(MEs),
colorLabels = FALSE,
colors = blueWhiteRed(50),
textMatrix = textMatrix,
setStdMargins = FALSE,
cex.text = 0.5,
zlim = c(-1,1),
main = paste("Module-trait relationships")
)

生成的圖片如下

指定一個(gè)我們感興趣的表型，可以得到與其相關(guān)性最高的module, 代碼如下

> which.trait <- "weight_g"
> moduleTraitCor[, which.trait]
> moduleTraitCor[, which.trait]
     MEmagenta        MEblack    MEturquoise        MEgreen    MElightcyan
  -0.017418109   -0.312679561   -0.272907078    0.001339804   -0.128053858
        MEblue        MEbrown          MEred       MEsalmon       MEyellow
   0.314323101    0.591340840    0.509942529    0.432058666    0.219900538
  MElightgreen  MEgreenyellow       MEgrey60         MEpink       MEpurple
  -0.057215182   -0.022394396   -0.016705204   -0.051495573   -0.021167541
         MEtan         MEcyan MEmidnightblue         MEgrey
   0.269827166    0.181595161    0.193569095    0.089702947

以上結(jié)果中，和weight_g最相關(guān)的為module為MEred，當(dāng)然也可以自己指定一個(gè)閾值，篩選出多個(gè)候選的modules。在WGCNA中，對(duì)于基因定義了GS值，表征基因和表型之間的相關(guān)性，對(duì)于module而言，也可以用所有基因GS絕對(duì)值的平均數(shù)來表征該module與表型之間的相關(guān)性，代碼如下

moduleColors = labels2colors(net$colors)
which.trait <- "weight_g"
y <- datTraits[, which.trait]
GS <- as.numeric(cor(y ,datExpr, use="p"))
GeneSignificance <-  abs(GS)
ModuleSignificance <- tapply(
GeneSignificance,
moduleColors, mean, na.rm=T)
plotModuleSignificance(GeneSignificance, moduleColors)

生成的圖片如下

可以看到brown, red這兩個(gè)模塊和體重相關(guān)。對(duì)于ME和某一表型而言，還可以將數(shù)據(jù)合并，聚類展示，代碼如下

weight <- datTraits[, which.trait]
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
MET = orderMEs(cbind(MEs, weight))
par(mar = c(4, 2, 1, 2), cex = 0.9)
plotEigengeneNetworks(
MET, "",
marDendro = c(0,4,1,2),
marHeatmap = c(3,4,1,2),
cex.lab = 0.8,
xLabelsAngle = 90
)

生成的圖片如下

4. 篩選關(guān)鍵基因

篩選出與表型高相關(guān)的modules之后，還可以對(duì)modules下的基因進(jìn)行進(jìn)一步篩選，主要根據(jù)GS值和MM值，代碼如下

datKME = signedKME(
datExpr,
MEs,
outputColumnName="MM.")
FilterGenes= abs(GS1)> .2 & abs(datKME$MM.brown)>.8

篩選出候選基因后，可以進(jìn)行下游的功能富集分析，使用clusterProfiler等R包，進(jìn)一步挖掘功能，之前的文章有詳細(xì)介紹如何進(jìn)行富集分析，這里就不展開了。

5. 導(dǎo)出module數(shù)據(jù), 繪制網(wǎng)絡(luò)圖

可以導(dǎo)出指定modules對(duì)應(yīng)的基因共表達(dá)網(wǎng)絡(luò)，方便可視化，代碼如下

TOM = TOMsimilarityFromExpr(datExpr, power = 6)
modules = c("brown", "red")
probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules));
modProbes = probes[inModule];
modTOM = TOM[inModule, inModule];
dimnames(modTOM) = list(modProbes, modProbes)
cyt = exportNetworkToCytoscape(
modTOM,
edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),
nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),
weighted = TRUE,
threshold = 0.02,
nodeNames = modProbes,
nodeAttr = moduleColors[inModule]
)

最終會(huì)生成以下兩個(gè)文件，可以導(dǎo)入cytoscape進(jìn)行繪圖

CytoscapeInput-edges-brown-red.txt
CytoscapeInput-nodes-brown-red.txt

當(dāng)然也支持導(dǎo)出VisANT軟件支持的格式，詳細(xì)用法請(qǐng)參閱官網(wǎng)的幫助文檔。

·end·

贊賞

共11人贊賞

午夜视频在线网站,日韩视频精品在线,中文字幕精品一区二区三区在线,在线播放精品,1024你懂我懂的旧版人,欧美日韩一级黄色片,一区二区三区在线观看视频

【原】WGCNA實(shí)戰(zhàn)練習(xí)

1. 數(shù)據(jù)預(yù)處理

2. 構(gòu)建共表達(dá)網(wǎng)絡(luò)，識(shí)別modules

3. 篩選與表型相關(guān)的modules

4. 篩選關(guān)鍵基因

5. 導(dǎo)出module數(shù)據(jù), 繪制網(wǎng)絡(luò)圖

2. 構(gòu)建共表達(dá)網(wǎng)絡(luò)，識(shí)別modules