本文采用WGCNA官網(wǎng)的Tutirial 1的數(shù)據(jù),對(duì)加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析和后續(xù)的數(shù)據(jù)挖掘的具體操作進(jìn)行梳理, 數(shù)據(jù)可以從官網(wǎng)下載,示意圖如下 整個(gè)分析流程可以分為以下幾個(gè)步驟 1. 數(shù)據(jù)預(yù)處理這部分包括以下4個(gè)內(nèi)容
官方的示例數(shù)據(jù)是一個(gè)小鼠的芯片表達(dá)譜數(shù)據(jù),包含了135個(gè)雌性小鼠的數(shù)據(jù),在提供的表達(dá)譜數(shù)據(jù)中,除了探針I(yè)D和樣本表達(dá)量之外,還有額外的探針注釋信息,在讀取原始數(shù)據(jù)時(shí),需要把多余注釋信息去除,代碼如下 # 讀取文件
options(stringsAsFactors = FALSE)
femData = read.csv("LiverFemale3600.csv")
# 去除多余的注釋信息列
datExpr0 = as.data.frame(t(femData[, -c(1:8)]))
names(datExpr0) = femData$substanceBXH
rownames(datExpr0) = names(femData)[-c(1:8)] 對(duì)于基因的表達(dá)量數(shù)據(jù),需要進(jìn)行過濾,對(duì)于基因而言,可以過濾缺失值或者低表達(dá)的基因,對(duì)于樣本而言,如果該樣本中基因缺失值很多,也需要過濾,WGCNA內(nèi)置了一個(gè)檢驗(yàn)基因和樣本的函數(shù),通過該函數(shù)可以進(jìn)行一個(gè)基本過濾,代碼如下 gsg = goodSamplesGenes(datExpr0)
if (!gsg$allOK) {
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
} goodSamples和goodGenes就是需要保留的基因和樣本?;A(chǔ)過濾之后,還可以看下是否存在離群值的樣本,通過樣本的聚類樹進(jìn)行判斷,代碼如下 plot(sampleTree,
main = "Sample clustering to detect outliers",
sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2) 生成的圖片如下 從圖上可以看出,F(xiàn)2_221 這個(gè)樣本和其他樣本差距很大,可以將該樣本過濾掉。代碼如下 clust = cutreeStatic(
sampleTree,
cutHeight = 15,
minSize = 10)
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr) 表型數(shù)據(jù)中也包含了不需要的列,而且其樣本比表達(dá)譜的樣本多,需要根據(jù)表達(dá)譜的樣本提取對(duì)應(yīng)的表型數(shù)據(jù),代碼如下 # 讀取文件
traitData = read.csv("ClinicalTraits.csv")
# 刪除多余的列
allTraits = traitData[, -c(31, 16)]
allTraits = allTraits[, c(2, 11:36) ]
# 樣本和表達(dá)譜的樣本保持一致
femaleSamples = rownames(datExpr)
traitRows = match(femaleSamples, allTraits$Mice)
datTraits = allTraits[traitRows, -1]
rownames(datTraits) = allTraits[traitRows, 1] 表達(dá)譜數(shù)據(jù)和表型數(shù)據(jù)準(zhǔn)備好之后,可以繪制樣本聚類樹和表型的熱圖,代碼如下 # 由于去除了樣本,重新對(duì)剩余樣本聚類
sampleTree2 = hclust(dist(datExpr), method = "average")
traitColors = numbers2colors(datTraits, signed = FALSE)
plotDendroAndColors(
sampleTree2,
traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap") 生成的圖片如下 上半部分為樣本的聚類樹,下班部分為樣本對(duì)應(yīng)的表型的熱圖,順序和聚類樹中的順序一致,表達(dá)量從低到高,顏色從白色過渡到紅色,灰色代表缺失值。 2. 構(gòu)建共表達(dá)網(wǎng)絡(luò),識(shí)別modules在構(gòu)建共表達(dá)網(wǎng)絡(luò)時(shí),將基因間的相關(guān)系數(shù)進(jìn)行乘方運(yùn)算來表征其相關(guān)性,首先需要確定乘方的值,代碼如下 # 設(shè)定一些列power梯度
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5) 在 > str(sft)
List of 2
$ powerEstimate: num 6
$ fitIndices :'data.frame': 15 obs. of 7 variables:
..$ Power : num [1:15] 1 2 3 4 5 6 7 8 9 10 ...
..$ SFT.R.sq : num [1:15] 0.0278 0.1264 0.3404 0.5062 0.6807 ...
..$ slope : num [1:15] 0.345 -0.597 -1.03 -1.422 -1.716 ...
..$ truncated.R.sq: num [1:15] 0.456 0.843 0.972 0.973 0.94 ...
..$ mean.k. : num [1:15] 747 254.5 111 56.5 32.2 ...
..$ median.k. : num [1:15] 761.7 250.8 101.7 47.2 25.1 ...
..$ max.k. : num [1:15] 1206 574 324 202 134 ... 其中 plot(
sft$fitIndices[,1],
-sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",
ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence")
)
text(
sft$fitIndices[,1],
-sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,
cex=0.9,
col="red"
)
abline(h=0.90, col="red") 生成的圖片如下 sft$fitIndices保存了每個(gè)power構(gòu)建的相關(guān)性網(wǎng)絡(luò)中的連接度的統(tǒng)計(jì)值, plot(
sft$fitIndices[,1],
sft$fitIndices[,5],
xlab="Soft Threshold (power)",
ylab="Mean Connectivity",
type="n",
main = paste("Mean connectivity")
)
text(
sft$fitIndices[,1],
sft$fitIndices[,5],
labels=powers,
cex=cex1,
col="red"
) 生成的圖片如下 確定好power值之后,可以直接構(gòu)建相關(guān)性網(wǎng)絡(luò) net = blockwiseModules(
datExpr,
power = sft$powerEstimate,
TOMType = "unsigned",
minModuleSize = 30,
reassignThreshold = 0,
mergeCutHeight = 0.25,
numericLabels = TRUE,
pamRespectsDendro = FALSE,
saveTOMs = TRUE,
saveTOMFileBase = "femaleMouseTOM",
verbose = 3)
mergedColors = labels2colors(net$colors)
plotDendroAndColors(
net$dendrograms[[1]],
mergedColors[net$blockGenes[[1]]],
"Module colors",
dendroLabels = FALSE,
hang = 0.03,
addGuide = TRUE,
guideHang = 0.05
) 生成的圖片如下 上方為基因的聚類樹,聚類時(shí)的距離為1-TOM值,下方為基因?qū)?yīng)的modules。類似的,還可以結(jié)合基因間的距離,即1-TOM值,用熱圖展示,代碼如下 geneTree = net$dendrograms[[1]]
moduleColors = labels2colors(net$colors)
dissTOM = 1 - TOMsimilarityFromExpr(
datExpr,
power = sft$powerEstimate)
plotTOM = dissTOM ^ 7
diag(plotTOM) = NA
TOMplot(
plotTOM,
geneTree,
moduleColors,
main = "Network heatmap plot, all genes"
) 生成的圖片如下 在前面文章我們提到過,在識(shí)別module的過程中共,首先用 unmergedColors = labels2colors(net$unmergedColors)
mergedColors = labels2colors(net$colors)
plotDendroAndColors(
net$dendrograms[[1]],
cbind(unmergedColors[net$blockGenes[[1]]], mergedColors[net$blockGenes[[1]]]),
c("Dynamic Tree Cut" , "Merged colors"),
dendroLabels = FALSE,
hang = 0.03,
addGuide = TRUE,
guideHang = 0.05
) 生成的圖片如下 對(duì)于合并前的modules, 其相關(guān)性分析的結(jié)果可視化如下 unmergedColors = labels2colors(net$unmergedColors)
MEList = moduleEigengenes(datExpr, colors = unmergedColors)
MEs = MEList$eigengenes
MEDiss = 1-cor(MEs)
METree = hclust(as.dist(MEDiss), method = "average")
plot(METree,
main = "Clustering of module eigengenes",
xlab = "",
sub = "") 生成的圖片如下 對(duì)于每個(gè)module而言,我們希望知道該module下對(duì)應(yīng)的基因,提取方式如下 > moduleColors = labels2colors(net$colors)
> unique(moduleColors)
[1] "grey" "turquoise" "grey60" "yellow" "tan"
[6] "green" "red" "black" "blue" "midnightblue"
[11] "cyan" "magenta" "salmon" "lightgreen" "brown"
[16] "purple" "pink" "greenyellow" "lightcyan"
> head(names(datExpr)[moduleColors=="red"])
[1] "MMT00000159" "MMT00000793" "MMT00000840" "MMT00001154" "MMT00001245"
[6] "MMT00001260" 同樣我們也可以提取module對(duì)應(yīng)的基因表達(dá)量數(shù)據(jù),繪制熱圖, 代碼如下 which.module="red"
plotMat(
t(scale(datExpr[,moduleColors==which.module ]) ),
nrgcols=30,
rlabels=F,
rcols=which.module,
main=which.module,
cex.main=2
) 生成的圖片如下 3. 篩選與表型相關(guān)的modules本質(zhì)上是計(jì)算module的ME值與表型的相關(guān)系數(shù),代碼如下 nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(
MEs,
datTraits,
use = "p"
)
moduleTraitPvalue = corPvalueStudent(
moduleTraitCor,
nSamples
) 可以對(duì)module和表型間的系數(shù)的結(jié)果進(jìn)行可視化,代碼如下 textMatrix = paste(
signif(moduleTraitCor, 2),
"\n(",
signif(moduleTraitPvalue, 1),
")",
sep = ""
)
dim(textMatrix) = dim(moduleTraitCor)
labeledHeatmap(
Matrix = moduleTraitCor,
xLabels = names(datTraits),
yLabels = names(MEs),
ySymbols = names(MEs),
colorLabels = FALSE,
colors = blueWhiteRed(50),
textMatrix = textMatrix,
setStdMargins = FALSE,
cex.text = 0.5,
zlim = c(-1,1),
main = paste("Module-trait relationships")
) 生成的圖片如下 指定一個(gè)我們感興趣的表型,可以得到與其相關(guān)性最高的module, 代碼如下 > which.trait <- "weight_g"
> moduleTraitCor[, which.trait]
> moduleTraitCor[, which.trait]
MEmagenta MEblack MEturquoise MEgreen MElightcyan
-0.017418109 -0.312679561 -0.272907078 0.001339804 -0.128053858
MEblue MEbrown MEred MEsalmon MEyellow
0.314323101 0.591340840 0.509942529 0.432058666 0.219900538
MElightgreen MEgreenyellow MEgrey60 MEpink MEpurple
-0.057215182 -0.022394396 -0.016705204 -0.051495573 -0.021167541
MEtan MEcyan MEmidnightblue MEgrey
0.269827166 0.181595161 0.193569095 0.089702947 以上結(jié)果中,和 moduleColors = labels2colors(net$colors)
which.trait <- "weight_g"
y <- datTraits[, which.trait]
GS <- as.numeric(cor(y ,datExpr, use="p"))
GeneSignificance <- abs(GS)
ModuleSignificance <- tapply(
GeneSignificance,
moduleColors, mean, na.rm=T)
plotModuleSignificance(GeneSignificance, moduleColors) 生成的圖片如下 可以看到 weight <- datTraits[, which.trait]
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
MET = orderMEs(cbind(MEs, weight))
par(mar = c(4, 2, 1, 2), cex = 0.9)
plotEigengeneNetworks(
MET, "",
marDendro = c(0,4,1,2),
marHeatmap = c(3,4,1,2),
cex.lab = 0.8,
xLabelsAngle = 90
) 生成的圖片如下 4. 篩選關(guān)鍵基因篩選出與表型高相關(guān)的modules之后,還可以對(duì)modules下的基因進(jìn)行進(jìn)一步篩選,主要根據(jù)GS值和MM值,代碼如下 datKME = signedKME(
datExpr,
MEs,
outputColumnName="MM.")
FilterGenes= abs(GS1)> .2 & abs(datKME$MM.brown)>.8 篩選出候選基因后,可以進(jìn)行下游的功能富集分析,使用 5. 導(dǎo)出module數(shù)據(jù), 繪制網(wǎng)絡(luò)圖可以導(dǎo)出指定modules對(duì)應(yīng)的基因共表達(dá)網(wǎng)絡(luò),方便可視化,代碼如下 TOM = TOMsimilarityFromExpr(datExpr, power = 6)
modules = c("brown", "red")
probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules));
modProbes = probes[inModule];
modTOM = TOM[inModule, inModule];
dimnames(modTOM) = list(modProbes, modProbes)
cyt = exportNetworkToCytoscape(
modTOM,
edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),
nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),
weighted = TRUE,
threshold = 0.02,
nodeNames = modProbes,
nodeAttr = moduleColors[inModule]
) 最終會(huì)生成以下兩個(gè)文件,可以導(dǎo)入cytoscape進(jìn)行繪圖
當(dāng)然也支持導(dǎo)出 ·end· |
|