The Samuel Roberts Noble Foundation, Inc.

Sumner Group: Metabolomics (continued)

We have used HPLC/PDA/MS/MS for the profiling and identification of saponins in legumes. Initial studies were performed with alfalfa because many of its saponins have been identified. Alfalfa served as a learning set for the HPLC/PDA/MS/MS characterization of saponins. We then applied this approach to the identification of approximately 20 previously unreported saponins in M. truncatula.[61] The identified saponins represent the end products of the triterpenoid biosynthetic pathway. The identified saponins, known entry points into the triterpene pathway and putative enzymes mined from the EST databases are currently being used to construct a biosynthetic pathway for these important natural products.[62]

The primary limitation of an HPLC/MS approach when compared to GC/MS is its lower separation efficiency. The separation efficiencies of GC/MS capillary columns can be in excess of 100,000 while those of HPLC are on the order of 10,000 to 40,000, thus HPLC has a lower ability to separate complex mixtures. Because HPLC is generally utilized to separate compounds not possible via GC/MS, it provides complementary rather than competitive data.

High Resolution Accurate Mass Measurements
Metabolic profiling can also be performed by utilizing high resolution and accurate mass measurements. These analyses are generally employed in a batch mode and rely on instrument resolution to differentiate components in a mixture. Resolution is a mass spectrometric instrumental performance parameter defined as the peak mass divided by its peak width at half height (M/?M). Increasing resolution makes possible the separation and differentiation of species progressively closer in mass. Increased resolution is also directly related to mass accuracy. Higher resolutions result in narrower peaks and greater precision in assigning mass-to-charge ratios.

Resolutions in the range of 10,000 to 20,000 are achievable with modern time-of-flight mass spectrometry (TOFMS). Fourier transform ion cyclotron resonance mass spectrometry (FTICRMS) is more costly but capable of resolutions exceeding 100,000. Resolutions exceeding 10,000 can provide low to sub parts-per-million mass accuracies. One ppm is equivalent to a mass accuracy of 0.001 for a molecular weight of 1,000 Da.

Most molecules have unique - exact or accurate masses - because all elements except carbon have non-integer values. For example, the mass of the most abundant isotope of C is 12.00000 Da, H is 1.007825 Da, N is 14.00307 Da, and oxygen is 15.99491 Da. [63] The nominal masses for glutamine and lysine are the same (146 Da), but the accurate mass of Gln (C5H10N2O3) is 146.06912 Da and the accurate mass of Lys (C6H14N2O2) is 146.10551 Da. The resolution or resolving power required to differentiate these two species would be (M/?M) 146/0.03639 or 4012; thus, these two species can be differentiated at 10,000 resolving power and 1 ppm mass accuracy.

We have used accurate mass measurements obtained by matrix-assisted laser desorption ionization time-of-flight mass spectrometry to differentiate and profile saponins from M. truncatula roots. An example is provided (Fig. 11) showing the MALDI-TOFMS spectra of a solid-phase extract of M. truncatula root tissue. In this spectrum, we can identify multiple saponins.

Figure 11
Figure 11. High resolution (i.e. greater than10,000) MALDI-TOFMS analysis of an alfalfa saponin extract. Accurate mass measurements are obtained by internal calibration and used to rapidly identify multiple saponins.

The major limitation of high resolution accurate mass profiling is its inability to differentiate isomeric species with the same empirical formula. An example of isomers would be glucose (C6H12O6) and galactose (C6H12O6). In GC/MS and LC/MS methods, the isomers generally have different elution times that allow for differentiation even though the observed masses are similar. Another limitation of accurate mass measurement performed in a batch mode includes its susceptibility to changes in matrix composition. For example, variations in salt levels lead to variations in the observed cation adducts, i.e. [M+H]+, [M+Na]+ and [M+K]+, of the same metabolite and complicate the quantification and interpretation of the data. Variations in other matrix components can result in competitive ionization or ion suppression.

Capillary Electrophoresis
The past ten years have seen countless demonstrations of the potential of capillary electrophoresis (CE) in many areas of biotechnology. [64-71] Capillary electrophoresis offers higher separation efficiencies than attainable in HPLC and uses simple instrumentation. The fundamental difference between traditional slab format electrophoresis and CE is that the separation is achieved inside small fused-silica capillaries, typically 25-100 µm, as first demonstrated by Jorgenson and co-workers. [72] CE can be performed in several different modes. The most common is capillary zone electrophoresis (CZE), which is similar to traditional paper electrophoresis where charged species migrate in a continuous electrolyte based on their net charge and size. Many different detection strategies have been explored in CE. [73] UV detection is the most common; however, the narrow internal diameter of the separation capillary results in lower sensitivity relative to HPLC with UV detection. An extremely sensitive detection scheme is laser-induced fluorescence (LIF) following derivatization with a suitable fluorophore allowing sub-femtomole levels to be quantified. Indirect detection schemes have also been devised that allow the detection of species not amenable to traditional detection techniques due to the absence of specific chemical properties such as the lack of a chromophore.

Carbohydrates
Capillary electrophoresis has proven a valuable tool in the analysis of carbohydrates. [74,75] Initial studies on the utility of CE in metabolic profiling appear promising. We have explored the use of CE for carbohydrate profiling of M. truncatula tissue extracts. Carbohydrates present a significant challenge in detection, since most sugars do not posses a strong chromophore or fluorophore. To overcome this problem, samples are derivatized with 4-aminobenzonitrile (4-ABN) by using reductive amination. This forms a stable product, and the reaction occurs rapidly upon heating.[76] Once derivatized, the extracts are analyzed in less than 20 minutes. Carbohydrate profiles of various M. truncatula tissue extracts obtained using CZE are shown in Fig. 12(a). Differences in carbohydrate accumulation for various tissue regions are evident. The concentration detection limits are in the low ppm range for monosacchrides. If carbohydrates need to be analyzed at lower concentrations, 9-aminopyrene-1,4,6-trisulfonate (APTS) can be used to label them at trace levels.[77] The fluorescence properties of APTS match the emission wavelengths generated by the argon ion laser, which is most commonly used in LIF-CE. LIF-CE allows parts-per-billion carbohydrate profiling. A major limitation of all derivatization chemistry is the lack of a universal reagent that derivatizes all reducing and non-reducing sugars equally.

Figure 12
Figure 12. Metabolic profiling by capillary electrophoresis. (a) Comparative carbohydrate profiles of M. truncatula tissue obtained using 4-aminobenzonitrile derivatization, capillary electrophoresis with a 150 mM borate buffer, pH = 9, and on-column UV detection at 214 nm. (b) Anion profile from M. truncatula using capillary electrophoresis and indirect UV detection. The separation buffer was 5 mM K2CrO4, 1% Waters OFM-Anion BT, pH 8.0.

An alternative to derivatizing carbohydrates is the use of indirect photometric detection. In this method, a detectable co-ion in the electrolyte is added to the buffer system generating a steady state absorbance signal in the detector. As the analyte ions migrate in front of the detector window, they displace the detectable co-ion and cause a decrease or negative response in the detector signal. This method provides universal detection of all anions or cations. Since most carbohydrates are not ionized at neutral pH, high pH electrolytes (pH~12) are required to induce partial ionization. Soga and Heiger demonstrated the utility of indirect detection for the routine analysis of carbohydrates. [78] The lower sensitivity of indirect detection yields detection limits in 20-100 ppm range for most simple sugars that is suitable for profiling highly abundant carbohydrate species, e.g. glucose, fructose, and sucrose, found in most plant tissue.

Anionic Species
CE possesses great potential for profiling small ionic species. The utilization of CE for ion analysis has proven to be competitive with HPLC in the areas of inorganic ions [79] and organic acids. [80] An anion profile from an aqueous extract of M. truncatula tissue obtained using CE with indirect detection is illustrated [Fig. 12(b)]. Several of the anions are detectable even at a 1:100 dilution of the extract. The analytical sensitivities in ion analysis are quite good with detection limits in the sub- to low-ppm range for most ions. Even lower concentrations of ions can be observed by applying sample stacking injection techniques. [81,82] Drawbacks to ion analysis by CE include a limited dynamic range and potential interference by highly concentrated ions in the sample.

Bioinformatics and Statistical Procession of Profile Data
Metabolite levels are expected to vary in response to genetic perturbation, biotic elicitation, and environmental stimuli however living organisms have inherent quantitative variations in the level of metabolites. These biological variations are usually significant and therefore necessitate statistical means to determine if the changes are significant. Statistics begins with experimental design including replicate sampling, controls, replicate analyses, and application of statistical tests. The data can then be used to calculate values such as a mean and standard deviation and general statistical tests such as the Student t-test can be performed to eliminate erroneous data. F-ratios can then be used to determine whether or not a change is significant at a given confidence level.

A single metabolite profile can potentially yield 300 to 500 distinct components. The wealth of information contained in these data sets is accessed only if the multivariant data can be processed and visualized in a timely manner. [83,84] To achieve this, automated computer applications that tell us first whether or not samples are statistically similar or different and second, what are the specific chemical composition of of the differences/similarities are needed. For example, we would like to compare the GC/MS, LC/MS, or CE profiles of a sample set in an automated mode and to be directed to the component(s) that are statistically different. Identification of specific metabolites corresponding to the statistically different peaks would then suggest function of the altered gene or response of the biological system. If identification is not desired, then the differences could be used as a unique or unbiased means of differentiation, phenotyping, genotyping, or relationship determinations.

To simplify the task, we have incorporated data mining techniques to reduce the complexity of the data set and to visualize the data. Currently, we use several approaches to statistically processing and visualizing the data including principle component analysis (PCA) and hierarchical clustering (HCA), Future efforts will will also seek to incorporate self-organizational maps (SOM) and neural networks. Our ultimate goal is to also to integrate these profiles directly to metabolic pathways or networks.

Bioinformatic efforts are being pursued in-house and through external collaborations. Our primary external collaboration involves joint project with The Virginia Bioinformatics Institute.

Chemical Complexity
Profiling the metabolome is challenging. The genome and transcriptome basically consist of four nucleotides with similar chemical properties; therefore, a global profiling approach is reasonably achieved. The proteome is substantially more complex, but is still composed of a limited set of twenty-two primary amino acids. Although more complex, 2-DE can differentiate a large number of proteins in a single analysis, with several thousand being routine and 10,000 representing the upper boundary. 19 When one surveys the metabolome, the chemical complexity is significantly greater. The chemical properties of metabolites range from ionic inorganic species to hydrophilic carbohydrates and sophisticated secondary natural products to hydrophobic lipids. The chemical diversity and complexity of the metabolome make it extremely challenging to profile ALL of the metabolome simultaneously. Currently, we do not believe that a single analytical technique provides the ability to profile all of the metabolome.

Dynamic Range
A technological challenge encountered in metabolomics is dynamic range. Dynamic range defines the concentration boundaries of an analytical determination over which the instrumental response is linear to the analyte concentration. The dynamic range of many techniques can be severely limited by the sample matrix or the presence of interfering and competing compounds. This is one of the most difficult issues to address in metabolomics. Most analytical mass spectrometric methods have dynamic ranges of 104 to 106 for individual components; however, this range is commonly and significantly reduced by the presence of other chemical components. In other words, the presence of some excessive metabolites can cause significant or severe chemical interferences that limit the range in which other metabolites may be successfully profiled. For example, large abundances of primary metabolites such as sugars often interfere with our ability to profile secondary metabolites such as flavonoids by GC/MS. The positive aspect of this dilemma is that many of the highly expressed metabolites are often unique in differing tissues or organisms, thus providing an exclusive basis for differentiation. These exclusive compounds are often referred to as biomarkers and one could view metabolomics as an array of biomarkers.

Principal Component Analysis
Principal component analysis is one of the oldest and most widely used multivariate techniques.[85] The concept behind PCA is to describe the variability in a set of multivariate data in terms of a set of uncorrelated variables, each of which is a particular linear combination of the original variables. More simply stated, PCA is an attempt to explain a set of complex data in terms of a smaller number of dimensions than one originally starts with. The mathematical approach used in PCA involves determining the eigenvalues and eigenvectors of a square symmetric matrix with sums of squares and cross products. The eigenvector associated with the largest eigenvalue has the same direction as the first principal component. The eigenvector associated with the second largest eigenvalue determines the direction of the second principal component. The sum of the eigenvalues equals the trace of the square matrix, and the maximum number of eigenvectors equals the number of rows (or columns) of this matrix. Plotting the two largest eigenvalues or principal components provides a rapid means of visualizing similarities or differences. A PCA plot of replicate GC/MS analyses of different M. truncatula tissues is shown (Fig. 13).

Figure 13
Figure 13. Principal component analysis of repetitive GC/MS profiles of M. truncatula root (R), stem (S) and leaves (L). The first and second principal component of each GC/MS analysis were calculated and plotted. The relative distance between points is a measure of similarity or difference. The clustering shows good reproducibility within the independent tissues but clear differentiation of tissues. The results also show that roots and stems are more similar to each other than to leaves.

The plot illustrates good correlation in the replicate measurements and it is obvious that root, leaf, and stem tissue differentiate. This approach will be used to determine similarity, differences, and relationships in a large number of t-DNA activation tagged mutants and various M. truncatula ecotypes.

Self Organizing Maps (SOMs)
Self-organizing maps (SOMs) are gaining popularity due to their enhanced ability to differentiate and visualize data relative to PCA.[86,87] SOMs are unsupervised neural learning algorithms in which weighted vectors of neurons are initially set at random and then reiteratively processed until they best represent the input data. Recently, SOMs have been applied to the correlation of GC/MS data to compare the morphology of eighty-eight species of ants.[88]

Hierarchical Cluster Analysis (HCA)
Hierarchical cluster analysis (HCA) also provides a method of determining relationships among different data sets. [89] HCA involves the progressive pair-wise comparison of large data sets to determine relationships or relative similarity/homogeneity. The HCA output is usually visualized as a phylogenetic-type tree, or dendrogram, in which branch lengths are proportional to correlation coefficients. More closely related data sets appear closer together in the phylogenetic tree. The output provides an easy visualization of the interrelationships and relationship distances within sample sets.

The above bioinformatic tools provide methods of determining differences or similarities in datasets. The next step is to incorporate metabolomic data with other expression information, including mRNA and proteins, to infer gene function. To accomplish this, metabolomic data sets must be integrated and correlated in a global manner with genetic and enzymatic data, pathways assembled into systems, and literature references incorporated as learning tools to annotate existing data to yield in silico biological information.[90] Approaches and tools are now available for modeling metabolic systems and are vital in understanding metabolism.[91] The challenge of the future is to integrate these approaches and obtain complete integrated functional genomic systems to understand and visualize biology.[92,93]

Interfering or competing analytes can often lower performance and/or bias MS profiling techniques. For example, it is difficult to profile oligosaccharides in the presence of peptides or amino acids by LC/MS. The reason is that amino acids have higher proton affinities than oligosaccharides and, therefore, yield higher abundances of charged species necessary for mass measurement. Another problem in electrospray ionization mass spectometry (ESI/MS) is salts. Low levels of ionic species (i.e., >10-4 M) are known to reduce the ionization efficiency in ESI/MS and significantly interfere with profiling all species.[35] Different analytical approaches have been developed to improve dynamic range and to minimize complications.

Pages: 1 2 · References