學(xué)習(xí)的第一個(gè)GATK找變異流程,人的種系變異的短序列變異,包括SNP和INDEL。寫了一個(gè)SnakeMake分析流程,從fastq文件到最后的vep注釋后的VCF文件,關(guān)于VCF的介紹可以參考上一篇推文基因序列變異信息VCF (Variant Call Format) 流程代碼在https:///BioQuest/smkhgs或https://github.com/BioQuestX/smkhgs READMEGATK best practices workflow Pipeline summary SnakeMake workflow for Human Germline short variants (SNP+INDEL) Reference- Reference genome related files and GTAK budnle files (GATK)
- VEP Variarition annotation files (VEP)
Prepare- Mark duplicates (samblaster)
- Generates recalibration table for Base Quality Score Recalibration (BaseRecalibrator)
- Apply base quality score recalibration (ApplyBQSR)
Quality control report- Alignment report (MultiQC)
Call- Call germline SNPs and indels via local re-assembly of haplotypes (HaplotypeCaller)
- Import VCFs to GenomicsDB (GenomicsDBImport)
- Perform joint genotyping on one or more samples pre-called with HaplotypeCaller (GenotypeGVCFs)
Filter- Select a SNP or INDEL of variants from a VCF file (SelectVariants)
- Build a recalibration model to score variant quality for filtering purposes (VariantRecalibrator)
- Apply a score cutoff to filter variants based on a recalibration table (ApplyVQSR)
- Merge all the VCF files (Picard)
AnnotationAnnotate variant calls with VEP (VEP) SnakeMake ReportOutputs. ├── config │ ├── captured_regions.bed │ ├── config.yaml │ └── samples.tsv ├── dag.svg ├── logs │ ├── annotate │ ├── call │ ├── filter │ ├── prepare │ ├── qc │ ├── ref │ └── trim ├── raw │ ├── SRR24443168.fastq.gz │ └── SRR24443169.fastq.gz ├── README.md ├── report │ ├── fastp_multiqc_data │ ├── fastp_multiqc.html │ ├── prepare_multiqc_data │ ├── prepare_multiqc.html │ └── vep_report.html ├── results │ ├── called │ ├── filtered │ ├── prepared │ ├── trimmed │ └── vep_annotated.vcf.gz ├── workflow │ ├── envs │ ├── report │ ├── rules │ ├── schemas │ ├── scripts │ └── Snakefile
Directed Acyclic GraphReferenceGATK best practices workflow: https://gatk./hc/en-us/sections/360007226651-Best-Practices-Workflows GATK: https://software./gatk/ VEP: https://www./info/docs/tools/vep/index.html fastp: https://github.com/OpenGene/fastp BWA mem2: http://bio-bwa./ samblaster: https://github.com/GregoryFaust/samblaster BaseRecalibrator: https://gatk./hc/en-us/articles/13832708374939-BaseRecalibrator ApplyBQSR: https://github.com/GregoryFaust/samblaster HaplotypeCaller: https://gatk./hc/en-us/articles/13832687299739-HaplotypeCaller GenomicsDBImport: https://gatk./hc/en-us/articles/13832686645787-GenomicsDBImport GenotypeGVCFs: https://gatk./hc/en-us/articles/13832766863259-GenotypeGVCFs SelectVariants: https://gatk./hc/en-us/articles/13832694334235-SelectVariants VariantRecalibrator: https://gatk./hc/en-us/articles/13832694334235-VariantRecalibrator ApplyVQSR: https://gatk./hc/en-us/articles/13832694334235-ApplyVQSR Picard: https://broadinstitute./picard MultiQC: https://
|