Patrick Zhao, Ph.D.
Problem 1: Advancing Bioinformatics for Plant Genotype-Phenotype (G2P) Association Discovery
Plants have evolved highly sophisticated mechanisms and efficient phenotypes (i.e., traits) to capture light energy and survive under harsh environmental conditions, including biotic/abiotic stresses, life-threatening pathogens and pest attacks, and complete life cycle competitively with other organisms. Plant genotype(s) are translated into phenotype(s) through cell signal transduction, gene expression regulated by transcription factors, small RNAs and epigenetic modifications, metabolic reactions and other developmental pathways; understanding such genotype-phenotype (G2P) relationships has been a fundamental and grand challenge in biology. Advancing bioinformatics for understanding the relationships between genomes and phenomes is thus recognized as one of grand challenges. One long-term research goal in the Zhao group is to advance bioinformatics for plant G2P association discovery via integrative analyses of genome-scale biological networks and genome-wide associations (GWAS).
Problem 2: Bioinformatics, Machine Learning and Big Biodata Analytics
Modern molecular biology often requires a systems-level analysis to understand how cells, tissues, and organs develop and function. With the rise of next generation high throughput 'omics' technologies, such as the next generation sequencing (NGS), data acquisition is no longer a barrier. The new challenge lies in processing the unprecedented scales of heterogeneous and often large data sets (also referred to as "big data") into information that can be applied in data-driven science discovery.
Approach 1: We have three integrative strategies: 1) to advance bioinformatics to decipher genome-scale signaling transduction, metabolism, and gene regulation networks (focusing on transcription factors, small RNAs and epigenetic modifications) in model and crop plants; 2) to develop innovative models and algorithms to enable high accuracy genome-wide plant marker and trait linkage mapping analysis using advanced mixed models that incorporate gene-by-gene (GXG) epistatic effects and genotype–by-environment (GXE) interaction effects; and 3) to develop an integrative knowledge discovery platform to facilitate the integration, deciphering, and mapping of genotype-phenotype associations. The resulting biological model-based algorithms, tools, web services, and data resources will facilitate the reverse engineering of 'omics' data into complex plant biological and genetic network models to decipher plant phenotypes from genotypes.
Approach 2: Combining domain knowledge in bioinformatics, computational biology, data science and plant biology, we have been developing and will continuously develop large-scale bioinformatics and data resources (e.g., bioinformatics web servers and biological databases), machine learning-based novel sequence analysis, 'omics' data integration and data mining methodologies, and an innovative graph search-empowered integrative knowledge discovery platform to extract biological insights from these 'big data' to fortify basic plant science, plant genomics and translational genomics research. Big data also brings the following technique challenges: huge workloads and low memory. To cope with these challenges, we are equipped with 1) next generation data analysis, data mining and knowledge discovery technologies such as artificially intelligence/machine learning-based data analytics, 2) high-performance computing clusters consisting of CPUs and GPUs, and 3) data manipulation tactics of the following: a) parallel computing, and b) data dividing/assembly methods.
- Development and Curation of the Alfalfa Breeder's Toolbox
- Advancing bioinformatics for plant genotype-phenotype (G2P) association discovery
- Development of methods, tools and databases for genome-scale biological network analysis
- Development of methods and tools for analyzing trait through omics wide association studies
- Development of methods and tools for analyzing genetic variant HapMap data for accession identification and genotyping
- Advancing bioinformatics for genome-wide analysis of small signaling peptides in Medicago truncatula with an emphasis on macronutrient regulation of root and nodule development
- Development of MtSSPdb: an Integrative Database of Medicago truncatula Small Signaling Peptides
- Development of an integrative platform to study gene function and genome evolution in legumes
- Development of novel methods and bioinformatics tools for the understanding of plant small RNA:mRNA interac¬tions or protein-DNA interactions for fast gene discovery and large-scale trait genotyping through the use of genetic screens and crop genetic engineering.