BMC前沿视角 | 来自严建兵课题组的热点解读
期刊:Genome Biology
微信链接:点击此处阅读微信文章
本期BMC前沿视角我们有幸邀请到了Genome Biology编委严建兵课题组成员为我们解读最近发表在Genome Biology以及BMC Biology的五篇重要文献,以飨读者。
严建兵课题组简介
严建兵课题组主要从事玉米基因组和分子育种研究。他们搭建了玉米关联分析平台,构建了多个人工分离群体,开发了相应的分析方法,获得了玉米高质量的参考基因组和变异图谱,这些研究材料和数据被国内外不同的研究小组广泛使用。基于这些材料平台他们课题组探究了玉米关键品质和农艺性状的遗传学基础,鉴定了多个关键功能基因及其调控网络,开发了系列功能分子标记,并应用于玉米新品种选育。他课题组另外一个兴趣是,植物单细胞测序技术和应用。目前已经开发了玉米四分体、单核和单胚囊等细胞的分离和测序技术,并应用于植物生殖发育,遗传重组和重编程方面等科学问题的研究。更多信息可以参考:www.maizego.org
严建兵教授目前是Genome Biology, Plant Journal, Science China Life Sciences,Theoretical and Applied Genetics, Journal of Integrative Plant Biology等期刊的编委会成员。他2015年获得了国家杰出青年基金的支持,2016年获得教育部长江学者特聘教授称号。
本期文章解读
#1. Title: Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing
评述人:肖英杰(副教授)
Research
Genome Biology
Date: 2020-04-06
点击此处,阅读相关内容
评论:
全表观组关联分析 (EWAS) 是研究海量的基因组表观印记与生物体表型性状之间关系的一种有效统计手段。假发现率(FDR)控制是EWAS分析中常用的多重假设检验的矫正方法。传统的FDR方法并不考虑其它协变量的影响,因此统计功效较低。Huang等人利用公共数据库的61套人类EWAS数据集,测试了5种协变量适应性的FDR方法对改善EWAS统计功效的作用。作者设计并开发了一种Omnibus检验,用以评估17个统计学和生物学协变量在数据分析中的信息含量。研究发现,统计学协变量在不同数据集中具有比较通用的信息量,而生物学协变量的信息量则强烈依赖于数据集本身的遗传结构。进一步分析发现,相比于传统FDR方法,独立权重假设法(IHW)和协变量适应性多重检验法(CAMT)在不同数据集中均有更好的EWAS统计功效,特别是对于稀疏信号遗传结构的数据集。作者利用独立研究鉴定到的衰老和吸烟有关的DMP进行交叉验证,结果表明协变量适应性FDR方法能更好地检测到这些DMP。最后,作者提出将不同或相似数据集的EWAS结果作为FDR校正的协变量,同样能显著提高EWAS分析的统计功效。总而言之,针对大数据时代的共性难题“多重检验”,本研究提出了一种整合统计学和生物学知识的新思路,为人们有效挖掘重要性状表型背后的生物学机制提供了新的机遇。
Epigenome-wide association study (EWAS) is an effective statistical approach to discover the association between epigenetic marks and biological traits. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not consider auxiliary covariates, causing reduced statistic power. Huang and his collaborators evaluated the performance of five covariate-adaptive FDR control methods by using 61 public EWAS data. The authors developed an omnibus test to evaluate the informativeness of 17 statistical and biological covariates. They found that statistical covariates are generally more informative and are universal across EWAS datasets, but the informativeness of biological covariates strongly depend on the structure of the datasets. The further analyses revealed that the independent hypothesis weighting (IHW) covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, compared to the traditional ST procedure. Cross validation analysis based on independent data of age- and smoking-associated DMPs revealed that the adaptive FDR control with informative covariates can improve the EWAS power to retrieve biological relevance. In summary, to address the common concern “multiple testing” in the era of big data, this research proposed a new idea of integrating statistical and biological knowledge to aid effectively mining the biological mechanisms behind important traits.
#2. Title: Gapless assembly of maize chromosomes using long-read technologies
评述人:吴伸伸 (硕士生)
Short Report
Genome Biology
Date: 2020-05-20
点击此处,阅读相关内容
评论:
串联重复序列的存在给基因组的组装带来巨大挑战,因为其往往会超出目前测序技术的读长。对于玉米来说,除了富含着丝粒和核糖体DNA(rDNA)序列外,两大类纽扣重复序列阻碍了玉米基因组的组装。Liu等人通过将含有Ab10(具有着丝粒状特性并优先分离于子代)的玉米系与B73回交六代并自交五代创造了一个新的玉米自交系B73-Ab10(BC6F5)。通过同时使用PacBio以及Nanopore的测序技术,并结合基于光学图谱的合并流程,他们组装了一个contig N50为162 Mb,并仅仅包含63个contig的玉米基因组B73-Ab10。该高质量的基因组包含无间隙组装的3号染色体(236 Mb)和9号染色体(162 Mb),以及长53Mb的Ab10减数分裂驱动单倍型。组装数据还揭示了七个着丝粒和五个异色结节的内部结构,表明主要的串联重复序列(CentC,knob180和TR-1)不是连续存在的。该研究描述了一种自动合并的组装方法,该方法可组装出无间断的玉米染色体,并显着提高整个基因组(包括着丝粒和纽结区)的组装质量。
The most challenging task in genome assembly is to go through the genomic regions containing tandem repeat arrays that exceed the read length of current sequencing technologies. In addition to these arrays that are enriched in centromeres and ribosomal DNA (rDNA), maize contains two abundant classes of knob repeats which largely hamper full genome assembly in Maize. Liu et al., generated a new maize inbred (B73-Ab10) by backcrossing a line containing Ab10, which has centromere-like properties and preferentially segregates to progeny, to the B73 inbred for six times and selfing it for five times (BC6F5). A maize genome (B73-Ab10) composed of 63 contigs with a contig N50 of 162 Mb was assembled by using both PacBio and Nanopore technologies and an optical map-based merging pipeline. This genome includes gapless chromosome 3 (236 Mb) and chromosome 9 (162 Mb), as well as 53 Mb of the Ab10 meiotic driven haplotype. The data also reveals the internal structure of seven centromeres and five heterochromatic knobs, showing that the major tandem repeat arrays (CentC, knob180, and TR-1) are discontinuous. Together, this study describes an automated genome assembly approach that yields gapless maize chromosomes. It dramatically improves contiguity throughout the genome, including centromere and knob regions.
#3. Title: Multiplexed capture of spatial configuration and temporal dynamics of locus-specific 3D chromatin by biotinylated dCas9
评述人:彭勇(博士生)
Method
Genome Biology
Date: 2020-03-05
点击此处,阅读相关内容
评论:
真核生物基因组通过三维空间结构介导染色质相互作用调节基因转录,继而控制生物体的生长发育。基因组根据不同分辨率可以分为不同结构单元,包括不同的区室,拓扑结构域(TAD)和染色质环。其中,染色质环介导增强子和启动子的相互作用,以控制组织和发育阶段特异性基因的时空表达。目前,解析顺式调控元件位点高分辨率染色质结构,阐释顺式调控元件作用与基因活性之间关联仍然具有挑战性。
应用Hi-C和ChIA-PET等技术已成功构建了全基因组染色质交互图谱,但受限于分辨率,上述技术不能提供时空特异性的染色质相互作用信息。先前,研究人员通过共表达含生物素受体位点的dCas9,细菌BirA生物素连接酶和靶标特异性sgRNA,将结合特定基因座的dCas9复合物在体内进行生物素化,通过基于链霉亲和素的亲和纯化以及基于3C的方法分别鉴定作用于特定位点的蛋白质和长距离DNA相互作用。但是,该方法用于对位点特异性染色质相互作用进行无偏分析,由于对细胞数量的要求较高,不适用于原代组织或稀有细胞群体。
本文在之前的基础上通过使用单转录本表达Cas9和BirA、C端生物素化dCas9以及混合sgRNA的方法。在准确引导dCas9结合基因组目标区域的同时,提高了dCas9捕获的效率。重新设计的系统允许在单个实验中对几个到数百个增强子或启动子的空间构型进行定量分析,从而可以比较基因簇内和簇之间的顺式调控元件。通过对红细胞超级增强子的多元分析,揭示了超级增强子不同层次结构和超级增强子与基因相互作用的独特模式。针对启动子区域高通量三维结构的捕获则鉴定了基因转录转录和细胞分化过程中增强子-启动子环的调控功能。该方法为进一步揭示了顺式调控元件在基因组中的功能提供了有力工具。
The eukaryotic genome mediates chromatin interaction through a three-dimensional spatial structure. The spatial chromatin structure regulates gene transcription, which in turns controls the growth and development process. The genome can be divided into different structural units based on resolutions, including different compartments, topologically associating domains(TAD) and chromatin loops. The chromatin loop mediates enhancer-promoter interactions to control tissue- and developmental stage-specific gene expression. However, it is still challenging to analyze the high-resolution chromatin structure at certain site and understand the relationship between cis-regulatory elements and gene activity.
Hi-C and ChIA-PET technologies have enabled systematic interrogations of genome-wide landscape of chromatin interactions but they often lack the level of resolution required to evaluate the spatiotemporal organization of locus-specific interactions. Previously, a dCas9-based three-dimensional structure capture method for studying specific sites was developed. By co-expressing dCas9, the biotin receptor site, bacterial BirA biotin ligase and target-specific sgRNA, the dCas9 complex bound locus was biotinylated in vivo and was then separated by affinity purification based on streptavidin. The associated proteins and long-range DNA interactions can be identified. This method was used for unbiased analysis of site-specific chromatin interactions. However, due to the requirements of large number of cells, it is not suitable for primary tissues or rare cell populations.
In this study, by co-expressing BirA and dCas9 from a single transcript and C-terminal biotinylated dCas9 and mixed sgRNAs, the efficiency of dCas9 capture had been largely improved. The new system enables quantitative analysis of the spatial configuration of a few to hundreds of enhancers and promoters in a single experiment. Therefore, cis-regulatory elements within and between gene clusters can be compared. The analysis of erythrocyte super-enhancers reveals the different hierarchies of super-enhancers and the unique pattern of interactions between super-enhancers and genes. The high-throughput capture of the promoter regions identified the determinate function of enhancer-promoter loop in transcription regulation and cell differentiation. This method provides a powerful tool to further study the function of cis-regulatory elements in the genome.
#4. Title: NetConfer: a web application for comparative analysis of multiple biological networks
评述人:桂松涛(博士后)
Software
BMC Biology
Date: 2020-05-19
点击此处,阅读相关内容
评论:
复杂的生物系统以及其中各种组分之间的联系,往往以基于图论的网络形式进行展示。因此,开发可以有效地对基于不同组学数据构建的生物网络进行比较分析的工具,将有助于更好地挖掘不同组学数据的关联以及生物网络系统中的特殊变化。在本研究中,Nagpal团队开发了NetConfer,一个集成了多种网络比较分析方法的生物网络分析网页应用。基于NetConfer,用户可以在线对多种生物网络进行网络组成相似度评估、网络关键节点的鉴定和比较、网络最短路径比较、网络社区(community)比较以及网络派系(cliques)比较等工作。为了展示NetConfer的实际应用效果,Nagpal等利用NetConfer对多发性硬化症(multiple sclerosis)肠道微生物相关网络进行分析,发现肠道微生物组比疾病本身对治疗手段更加敏感,可以作为评估干扰素和克帕松对多发性硬化症治疗效果的新指标;Nagpal等还利用NetConfer对致病性和非致病性的结合分枝杆菌(Mycobacterium tuberculosis)感染的人巨噬细胞的时序性基因表达网络进行了分析,鉴定出了一些可能受到结合分枝杆菌干扰的基因。最后,为了方便生物学家的使用,NetConfer提供了用户友好的作业管理和结果展示工具,该应用也提供相应离线版本用于本地大数据的分析。总体而言,NetConfer是一个可以进行多种生物网络数据的比较、分析和信息挖掘的实用工具。
The complexity of biological systems and the interactions of various components within the system are always presented via graph theory-based networks. Tools enabling comparative analysis of multiple networks help to identify variations across different biological systems. In this study, Nagpal et al. present NetConfer, a web application which implements multiple network comparison and presents them in the form of organized workflows, including assessing similarity of network components, identifying and comparing key nodes, comparing shortest paths, inferring and comparing community structures, as well as comparative analysis of network cliques. In order to demonstrate applications of the NetConfer tool, Nagpal et al. analyzed the gut microbial associated networks in multiple sclerosis, and found that the microbiome is more affected during the treatment as compared to the disease itself, indicating that the microbiome might serve as a valuable mark to evaluate the effectiveness of interferon versus Copaxone. Nagpal et al. also analyzed time series gene networks of human macrophages under differential mycobacterium infection, and identified candidate gene that are affected upon infection. NetConfer tool is developed keeping in mind the requirements of researchers working in the field of biological data analysis with limited programming expertise. A stand-alone version has also been supplemented to accommodate the offline requirement of processing large networks. Collectively, NetConfer is a useful tool for comparative analysis of multiple biological networks.
#5. Title: Accounting for cell type hierarchy inevaluating single cell RNA-seq clustering
评述人:李书彦(硕士生)
Short Report
Genome Biology
Date: 2020-05-25
点击此处,阅读相关内容
评论:
相较于传统的混合RNA-Seq,单细胞RNA-seq可明晰细胞的异质性并解析细胞特异的转录和调控,被广泛应用于疾病、发育和环境应答等研究。细胞分型是单细胞RNA-seq数据分析过程中的重要步骤。所分离的细胞类型是否准确,需要结合通过其他高置信手段获得的标记基因信息予以确认。目前,评估细胞分型效果主要使用Adjusted Rand Index(ARI)和normalized mutual information(NMI)。但细胞分型问题有其特殊性,真正细胞群体通常具有层级结构,不同的细胞群体之间距离不一致,相关性也不同。传统评估方式并未考虑层级结构这一因素,因而对软件分群效果的评估会有偏颇。研究人员在传统RI和NMI基础之上,从细胞层次结构中获得权重,开发出两种新的矩阵模型weighted Rand index(wRI)和weighted normalized mutual information(wNMI)用于评估单细胞RNA-seq细胞分群结果。在模拟数据测试中,设定有相同的错误分群细胞个数,结果1在亚群中将细胞错误分类,结果2中将细胞错误归到另一大的亚群。ARI和NMI认为两种结果分群效果一致,而wRI和wNMI可以区分出两种结果的差异,并认为结果 1比结果2效果好。利用wRI和wNMI评估monocle,CIDR,Seurat,TSCAN和SC3等多种单细胞RNA-seq细胞分群软件,发现多种方法的真实性能均高于预期,尤其是CIDR和TSCAN两款软件提升较明显。Seurat和SC3的效果在新的评价方法下相差无几。这为未来的单细胞数据分析提供了新的工具。
Compared with bulk RNA-seq, scRNA-seq reveals cell–to-cell heterogeneity in transcription, providing unique information to understand biological processes in development, differentiation, disease etiologies and environmental response. To evaluate the performance of a clustering method, the common practice is to compare clustering result with reference markers, which are obtained from another source with high confidence. The most widely used measures to evaluate the performance of a clustering method are the adjusted Rand index (ARI) and the normalized mutual information (NMI). Unlike many other clustering, the true cluster structure for a cell population is often hierarchical, which means that the distance and correlation between different subgroups are not consistent. Traditional methods fail to take this true hierarchy into account in the evaluation of clustering results, leading to assessments that do not accurately reflect the ability to group cells. Researchers modified the traditional RI and NMI methods and developed two new metrics: weighted Rand index (wRI) and weighted normalized mutual information (wNMI), for the evaluation of cell clustering. In a toy example, researchers set the same number of wrong grouped cells to test wRI and wNMI. In result1, the wrong grouped cells are still in the same subgroup but were labeled as wrong, while the wrong grouped cells are classified into different subgroups in result2. Strikingly, ARI and NMI give the two clustering results identical scores. In contrast, both wRI and wNMI could distinguish the two and picked result1. Then researchers apply the new metrics to compare the performances of five popular cell clustering methods, including monocle, CIDR, Seurat, TSCAN and SC3. All of the five software show better performance than reported by RI and NMI, especially in the case of CIDR and TSCAN.The overall performance of Seurat and SC3 is similar. This provides a new tool for future single cell data analysis.
(来源:科学网)
特别声明:本文转载仅仅是出于传播信息的需要,并不意味着代表本网站观点或证实其内容的真实性;如其他媒体、网站或个人从本网站转载使用,须保留本网站注明的“来源”,并自负版权等法律责任;作者如果不希望被转载或者联系转载稿费等事宜,请与我们接洽。