午夜视频在线网站,日韩视频精品在线,中文字幕精品一区二区三区在线,在线播放精品,1024你懂我懂的旧版人,欧美日韩一级黄色片,一区二区三区在线观看视频

分享

Python設(shè)置以及改變工作目錄讀取數(shù)據(jù)以及查看數(shù)據(jù)

 公彥棟 2017-09-30
os.getcwd()
os.chdir("F:\python_test")##"",''在python是有區(qū)別的,另外路徑要使用英文的

# Python plotting library
import matplotlib.pyplot as plt
# Numerical python library (pronounced "num-pie")
import numpy as np
# Dataframes in Python
import pandas as pd
# Statistical plotting library we'll use
import seaborn as sns
# This is necessary to show the plotted figures inside the notebook -- "inline" with the notebook cells
 %matplotlib inline
#####文件的讀取
shalek2013_expression = pd.read_table('GSE41265_allGenesTPM.txt.gz',                           
                                      index_col=0, 
                                      compression='gzip')
shalek2013_expression.head()###查看
#####設(shè)置顯示的結(jié)果
pd.options.display.max_columns = 50
pd.options.display.max_rows = 50
shalek2013_expression.head()
shalek2013_expression###查看數(shù)據(jù)的維度
#####讀入注釋文件
shalek2013_metadata = pd.read_table('~/Downloads/GSE41265_series_matrix.txt', 
                                    skiprows=33, 
                                    index_col=0)
shalek2013_metadata
####轉(zhuǎn)置

shalek2013_metadata = shalek2013_metadata.T
shalek2013_metadata
shalek2013_metadata.index與shalek2013_metadata.columns分別是行名與列名,與R中的rownames,colnames對應(yīng)
####整理列名
[x.strip('!') for x in shalek2013_metadata.columns]
上面的代碼可以用函數(shù)做到
def remove_exclamation(x):
    return x.strip('!')
shalek2013_metadata.columns.map(remove_exclamation)
####賦值
shalek2013_metadata.columns = shalek2013_metadata.columns.map(lambda x: x.strip('!'))
shalek2013_metadata.head(8)####顯示前8行
####畫圖并保存圖像
sns.boxplot(shalek2013_expression)
# gcf = Get current figure
fig = plt.gcf()
fig.savefig('shalek2013_expression_boxplot.pdf')
#####
expression_logged < 10
expression_at_most_10 = expression_logged[expression_logged < 10]
expression_at_most_10
####質(zhì)控QC,pd操作都是基于列,要想對行做操作需要設(shè)置axis=1
genes_of_interest = (expression_logged > 1).sum(axis=1) >= 3
expression_filtered_by_all_samples = expression_logged.loc[genes_of_interest]###行的選擇
print(expression_filtered.shape)
expression_filtered.head()
sns.boxplot(expression_filtered_by_all_samples)
# gcf = Get current figure
fig = plt.gcf()
fig.savefig('expression_filtered_by_all_samples_boxplot.pdf')
#####對列(細胞)進行質(zhì)控
pooled_ids = [x for x in expression_logged.columns if x.startswith('P')] 
###python code 的簡潔性
pooled = expression_logged[pooled_ids]###默認是列的操作,而行則是要加loc,等同于expression_logged.loc[:, pooled_ids].head()
#######以上的QC均是在所有的基礎(chǔ)上,the following code refer to single
single_cell=[x for x in expression_logged.columns if x.startswith('S')]
expression_by_single_cells=expression_logged[single_cell]
gene_select=(expression_by_single_cells>1).sum(axis=1)>3
expression_filtered_by_singles=expression_by_single_cells.loc[gene_select]
Assert expression_filtered_by_singles.shape==(6312, 21)	

    本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息,謹防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊一鍵舉報。
    轉(zhuǎn)藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多