“生物信息学”的版本间的差异

2026年2月7日 (六) 20:01的版本

生物信息学（Bioinformatics）是一门高度交叉的学科，它结合了生物学、计算机科学、信息工程、数学和统计学，旨在开发用于存储、检索、组织和分析生物数据（特别是基因组序列和蛋白质结构）的方法和软件工具。
随着人类基因组计划（HGP）的完成和二代测序（NGS）技术的爆发，生物学已从传统的“观察科学”转变为数据密集型的“信息科学”。
生物信息学的核心任务是将海量的、碎片化的生物数据（如 A/T/C/G 序列）转化为有意义的生物学洞见（如致病机理、进化关系、药物靶点），是现代精准医学和新药研发的基石。

Bioinformatics

In Silico Biology (点击展开)

连接“代码”与“生命”的桥梁

学科档案
核心构成	生物 + 计算机 + 统计
实验类型	干实验 (Dry Lab)
主要数据	DNA/RNA 序列, 蛋白结构
核心数据库	NCBI (GenBank), PDB
常用工具栈
编程语言	Python, R, Linux Shell
比对算法	BLAST, BWA, Bowtie
变异分析	GATK, Mutect2
结构预测	AlphaFold

三大核心领域 (The Big Three)

生物信息学虽然包罗万象，但其核心工作流主要围绕着中心法则（DNA -> RNA -> Protein）展开。

组学 (Omics)	核心问题	典型分析任务
基因组学 (Genomics)	“我有什么？” 研究 DNA 序列本身及其变异。	序列组装 (Assembly)、变异检测 (Variant Calling, SNPs/Indels)、系统发育树构建。
转录组学 (Transcriptomics)	“我在做什么？” 研究基因的表达水平。	差异表达分析 (Differential Expression, DE)、单细胞测序聚类 (scRNA-seq)、通路富集分析 (GO/KEGG)。
蛋白质组学 (Proteomics)	“我长什么样？” 研究蛋白的结构与功能。	蛋白质结构预测 (AlphaFold)、分子对接 (Docking)、蛋白质相互作用网络 (PPI)。

从数据到临床：NGS 分析流程

在临床诊断（如癌症、遗传病）中，生物信息学主要负责处理高通量测序（NGS）产生的原始数据。

原始数据 (Raw Data)： 测序仪产出的 .fastq 文件，包含数亿条短序列（Reads）及其质量评分（Quality Score）。
比对 (Alignment/Mapping)： 将短序列像“拼图”一样比对到人类参考基因组（Reference Genome, 如 hg38）上，生成 .bam 文件。
变异检出 (Variant Calling)： 利用算法找出样本与参考基因组不同的位点，生成 .vcf 文件（Variant Call Format）。
注释与解读 (Annotation)： 利用数据库（如 ClinVar, gnomAD）标记这些变异的临床意义（良性/致病），最终生成临床报告。

       关键相关概念 [Key Concepts]

1. Pipeline (分析流程)： 生信分析通常不是单一软件完成的，而是将多个工具串联起来（如 QC -> Trim -> Map -> Call），形成自动化的工作流（Workflow），常用工具如 Nextflow, Snakemake。

2. Algorithm (算法)： 生物信息学的核心。例如 动态规划（Dynamic Programming）用于序列比对，隐马尔可夫模型（HMM）用于基因预测，深度学习（Deep Learning）用于蛋白结构预测。

3. Databases (数据库)： 生信的“粮仓”。包括一级数据库（存储原始数据，如 GenBank, SRA）和二级数据库（存储整理后的知识，如 UniProt, KEGG, OMIM）。

       学术参考文献 [Academic Review]

[1] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990). Basic local alignment search tool (BLAST). J Mol Biol.
[点评]：史上引用率最高的生物学论文之一。BLAST 算法让海量序列的快速比对成为可能，是生物信息学的奠基工具。

[2] Lander ES, et al. (2001). Initial sequencing and analysis of the human genome. Nature.
[点评]：人类基因组计划（HGP）草图发表。标志着生物学正式进入组学（Omics）和大数据时代。

[3] Jumper J, Evans R, Pritzel A, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature.
[点评]：人工智能的胜利。解决了困扰生物学 50 年的“蛋白折叠问题”，证明了 AI 在生物信息学中的统治级潜力。

           计算生物学 · 知识图谱

上级分类	生物学 • 计算机科学 • 交叉学科
技术驱动	NGS (测序) • AI (深度学习) • 云计算
应用场景	药物研发 • 遗传咨询 • 进化分析

@@ 第3行： / 第3行： @@
      <div style="margin-bottom: 30px; border-bottom: 1.2px solid #e2e8f0; padding-bottom: 25px;">
          <p style="font-size: 1.1em; margin: 10px 0; color: #334155; text-align: justify;">
-             <strong>生物信息学</strong>（Bioinformatics）是一门高度交叉的学科，它结合了生物学、计算机科学、信息工程、数学和统计学，旨在开发用于存储、检索、组织和分析生物数据（特别是<strong>[[基因组]]</strong>序列和<strong>[[蛋白质]]</strong>结构）的方法和软件工具。
+             <strong>生物信息学</strong>（Bioinformatics）是一门高度交叉的学科，它结合了[[生物学]]、[[计算机科学]]、[[信息工程]]、[[数学]]和[[统计学]]，旨在开发用于存储、检索、组织和分析生物数据（特别是<strong>[[基因组]]</strong>序列和<strong>[[蛋白质]]</strong>结构）的方法和软件工具。
              <br>随着<strong>[[人类基因组计划]]</strong>（HGP）的完成和<strong>[[二代测序]]</strong>（NGS）技术的爆发，生物学已从传统的“观察科学”转变为数据密集型的“信息科学”。
-             <br>生物信息学的核心任务是将海量的、碎片化的生物数据（如 A/T/C/G 序列）转化为有意义的生物学洞见（如致病机理、进化关系、药物靶点），是现代<strong>[[精准医学]]</strong>和新药研发的基石。
+             <br>生物信息学的核心任务是将海量的、碎片化的生物数据（如 A/T/C/G 序列）转化为有意义的生物学洞见（如致病机理、进化关系、药物靶点），是现代<strong>[[精准医学]]</strong>和[[新药研发]]的基石。
          </p>
      </div>
@@ 第48行： / 第48行： @@
                  <tr>
                      <th style="text-align: left; padding: 6px 12px; background-color: #f8fafc; color: #475569; border-bottom: 1px solid #e2e8f0;">编程语言</th>
-                     <td style="padding: 6px 12px; border-bottom: 1px solid #e2e8f0; color: #1e40af;">[[Python]], [[R语言|R]], Linux Shell</td>
+                     <td style="padding: 6px 12px; border-bottom: 1px solid #e2e8f0; color: #1e40af;">[[Python]], [[R语言|R]], [[Linux]] Shell</td>
                  </tr>
                  <tr>
@@ 第56行： / 第56行： @@
                  <tr>
                      <th style="text-align: left; padding: 6px 12px; background-color: #f8fafc; color: #475569; border-bottom: 1px solid #e2e8f0;">变异分析</th>
-                     <td style="padding: 6px 12px; border-bottom: 1px solid #e2e8f0; color: #0f172a;">GATK, Mutect2</td>
+                     <td style="padding: 6px 12px; border-bottom: 1px solid #e2e8f0; color: #0f172a;">[[GATK]], Mutect2</td>
                  </tr>
                   <tr>
@@ 第80行： / 第80行： @@
                  <td style="padding: 10px; border: 1px solid #cbd5e1; font-weight: 600;">[[基因组学]]<br>(Genomics)</td>
                  <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>“我有什么？”</strong><br>研究 DNA 序列本身及其变异。</td>
-                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>序列组装</strong> (Assembly)、<strong>变异检测</strong> (Variant Calling, SNPs/Indels)、系统发育树构建。</td>
+                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>[[序列组装]]</strong> (Assembly)、<strong>[[变异检测]]</strong> (Variant Calling, SNPs/Indels)、[[系统发育树]]构建。</td>
              </tr>
              <tr>
                  <td style="padding: 10px; border: 1px solid #cbd5e1; font-weight: 600;">[[转录组学]]<br>(Transcriptomics)</td>
                  <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>“我在做什么？”</strong><br>研究基因的表达水平。</td>
-                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>差异表达分析</strong> (Differential Expression, DE)、单细胞测序聚类 (scRNA-seq)、通路富集 (GO/KEGG)。</td>
+                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>[[差异表达分析]]</strong> (Differential Expression, DE)、[[单细胞测序]]聚类 (scRNA-seq)、[[通路富集分析]] (GO/KEGG)。</td>
              </tr>
              <tr>
                  <td style="padding: 10px; border: 1px solid #cbd5e1; font-weight: 600;">[[蛋白质组学]]<br>(Proteomics)</td>
                  <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>“我长什么样？”</strong><br>研究蛋白的结构与功能。</td>
-                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>结构预测</strong> (AlphaFold)、分子对接 (Docking)、蛋白质相互作用网络 (PPI)。</td>
+                 <td style="padding: 10px; border: 1px solid #cbd5e1;"><strong>[[蛋白质结构预测]]</strong> (AlphaFold)、[[分子对接]] (Docking)、蛋白质相互作用网络 (PPI)。</td>
              </tr>
          </table>
@@ 第101行： / 第101行： @@
      <div style="background-color: #f0f9ff; border-left: 5px solid #1e40af; padding: 15px 20px; margin: 20px 0; border-radius: 4px;">
          <ul style="margin: 0; padding-left: 20px; color: #334155;">
-             <li style="margin-bottom: 12px;"><strong>原始数据 (Raw Data)：</strong> 测序仪产出的 <code>.fastq</code> 文件，包含数亿条短序列（Reads）及其质量评分。</li>
+             <li style="margin-bottom: 12px;"><strong>原始数据 (Raw Data)：</strong> 测序仪产出的 <code>.fastq</code> 文件，包含数亿条短序列（Reads）及其质量评分（Quality Score）。</li>
-             <li style="margin-bottom: 12px;"><strong>比对 (Alignment/Mapping)：</strong> 将短序列像“拼图”一样比对到人类参考基因组（Reference Genome, 如 hg38）上，生成 <code>.bam</code> 文件。</li>
+             <li style="margin-bottom: 12px;"><strong>比对 (Alignment/Mapping)：</strong> 将短序列像“拼图”一样比对到人类[[参考基因组]]（Reference Genome, 如 hg38）上，生成 <code>.bam</code> 文件。</li>
-             <li style="margin-bottom: 12px;"><strong>变异检出 (Variant Calling)：</strong> 找出样本与参考基因组不同的位点，生成 <code>.vcf</code> 文件。</li>
+             <li style="margin-bottom: 12px;"><strong>变异检出 (Variant Calling)：</strong> 利用算法找出样本与参考基因组不同的位点，生成 <code>.vcf</code> 文件（Variant Call Format）。</li>
-             <li style="margin-bottom: 0;"><strong>注释与解读 (Annotation)：</strong> 利用数据库（如 ClinVar, gnomAD）标记这些变异的临床意义（良性/致病），最终生成临床报告。</li>
+             <li style="margin-bottom: 0;"><strong>注释与解读 (Annotation)：</strong> 利用数据库（如 [[ClinVar]], [[gnomAD]]）标记这些变异的临床意义（良性/致病），最终生成临床报告。</li>
          </ul>
      </div>
@@ 第112行： / 第112行： @@
          <p style="margin: 12px 0; border-bottom: 1px solid #e2e8f0; padding-bottom: 10px;">
-             <strong>1. Pipeline (分析流程)：</strong> 生信分析通常不是单一软件完成的，而是将多个工具串联起来（如 QC -> Trim -> Map -> Call），形成自动化的工作流（Workflow），常用工具如 Nextflow, Snakemake。
+             <strong>1. Pipeline (分析流程)：</strong> 生信分析通常不是单一软件完成的，而是将多个工具串联起来（如 QC -> Trim -> Map -> Call），形成自动化的工作流（Workflow），常用工具如 [[Nextflow]], [[Snakemake]]。
          </p>
          <p style="margin: 12px 0; border-bottom: 1px solid #e2e8f0; padding-bottom: 10px;">
-             <strong>2. Algorithm (算法)：</strong> 生物信息学的核心。例如 <strong>动态规划</strong>（Dynamic Programming）用于序列比对，<strong>隐马尔可夫模型</strong>（HMM）用于基因预测，<strong>深度学习</strong>（Deep Learning）用于蛋白结构预测。
+             <strong>2. Algorithm (算法)：</strong> 生物信息学的核心。例如 <strong>[[动态规划]]</strong>（Dynamic Programming）用于序列比对，<strong>[[隐马尔可夫模型]]</strong>（HMM）用于基因预测，<strong>[[深度学习]]</strong>（Deep Learning）用于蛋白结构预测。
          </p>
          <p style="margin: 12px 0;">
-             <strong>3. Databases (数据库)：</strong> 生信的“粮仓”。包括一级数据库（存储原始数据，如 GenBank, SRA）和二级数据库（存储整理后的知识，如 UniProt, KEGG, OMIM）。
+             <strong>3. Databases (数据库)：</strong> 生信的“粮仓”。包括一级数据库（存储原始数据，如 [[GenBank]], SRA）和二级数据库（存储整理后的知识，如 [[UniProt]], [[KEGG]], [[OMIM]]）。
          </p>
      </div>
@@ 第128行： / 第128行： @@
          <p style="margin: 12px 0; border-bottom: 1px solid #e2e8f0; padding-bottom: 10px;">
-             [1] <strong>Altschul SF, et al. (1990).</strong> <em>Basic local alignment search tool (BLAST).</em> <strong>[[J Mol Biol]]</strong>. <br>
+             [1] <strong>Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990).</strong> <em>Basic local alignment search tool (BLAST).</em> <strong>[[J Mol Biol]]</strong>. <br>
-             <span style="color: #475569;">[点评]：史上引用率最高的生物学论文之一。BLAST 算法让海量序列的快速比对成为可能，是生物信息学的基石工具。</span>
+             <span style="color: #475569;">[点评]：史上引用率最高的生物学论文之一。BLAST 算法让海量序列的快速比对成为可能，是生物信息学的奠基工具。</span>
          </p>
@@ 第138行： / 第138行： @@
          <p style="margin: 12px 0;">
-             [3] <strong>Jumper J, et al. (2021).</strong> <em>Highly accurate protein structure prediction with AlphaFold.</em> <strong>[[Nature]]</strong>. <br>
+             [3] <strong>Jumper J, Evans R, Pritzel A, et al. (2021).</strong> <em>Highly accurate protein structure prediction with AlphaFold.</em> <strong>[[Nature]]</strong>. <br>
              <span style="color: #475569;">[点评]：人工智能的胜利。解决了困扰生物学 50 年的“蛋白折叠问题”，证明了 AI 在生物信息学中的统治级潜力。</span>
          </p>
@@ 第154行： / 第154行： @@
              <tr style="border-bottom: 1px solid #f1f5f9;">
                  <td style="width: 85px; background-color: #f8fafc; color: #334155; font-weight: 600; padding: 10px 12px; text-align: right; vertical-align: middle;">技术驱动</td>
-                 <td style="padding: 10px 15px; color: #334155;">[[NGS]] (测序) • [[AI]] (深度学习) • 云计算</td>
+                 <td style="padding: 10px 15px; color: #334155;">[[NGS]] (测序) • [[AI]] (深度学习) • [[云计算]]</td>
              </tr>
              <tr>

匿名

搜索

“生物信息学”的版本间的差异

名字空间

更多

页面选项

2026年2月7日 (六) 20:01的版本

三大核心领域 (The Big Three)

从数据到临床：NGS 分析流程

导航

导航

功能菜单

Wiki工具

Wiki工具

匿名

搜索

“生物信息学”的版本间的差异

2026年2月7日 (六) 20:01的版本

三大核心领域 (The Big Three)

从数据到临床：NGS 分析流程

导航

Wiki工具

页面工具