An article published today in the Open-Access magazine GigaScience provides data that effectively triple the number of plant species with available genomic data. This enormous amount of work comes from the increasing efforts of the scientific community to arrange more plant genomes to help understand their complex evolution and provide practical information to improve agricultural yields. To date, about 350 terrestrial plant genomes are sequenced. The desire for more plant genomic sequences was recently highlighted by the announcement of the 10KP project, which ultimately aims to arrange 10,000 plant genomes to solve the evolution of all the major branches of the plant tree. The work here provides images, raw sequence data, assembled chloroplast genomes, and preliminary assemblies of nuclear genomes – all of which are freely available. Effectively this work is a digital representation of the entire botanical garden.
Researchers from China's National Genbank, BGI, and Ruilly Forestry Office, China, have taken samples and arranged 761 samples representing 689 types of vascular plants of 137 families and 49 orders. Plant samples are from and around 500 hectares Botanical Garden in Ruilli, a subtropical part of China bordering Myanmar. Since it is located in a biologically rich part of China, the garden is committed to protecting endangered and Chinese endemic plants, including preserving and archiving these germ-plasma resources to help their long-term conservation. This project is the first scientific and systematic experience of the world to digitize an entire botanical garden based on genomic information as well as voucher information.
From the scientific potential of this resource, BGI Chief Executive Officer and author Xun Xu highlights that: "The current understanding of plant evolution and its diversity in phylogenetic contexts is limited due to the lack of genomic information among phylogenetically diverse species. This innovative project unites a new way of thinking about the digitization of all plant species to enhance evolutionary and ecological research in botanical gardens. "
Overall, the investigators produced 54 terabytes of sequence data, with an average sequencing depth of 60 times per species. In addition to the main challenge of DNA sequencing of this number of species, another major task was to increase species identification, digitize sample images, and build a new herbarium for their storage in the new Chinese National Genetic Bank (CNGB) a herbarium in Shenzhen. So far, from 761 data, the sequence and the chloroplast have enabled 257 plants to be identified at species level and 504 at family level. Deep training has also been successfully applied to 181 species so that they can be identified at species level.
Author Ting Yang says this is "the largest amount of data I've ever processed. During the analysis of the data, I think the biggest challenges are the sequencing and the examination of the results. "This required the researchers to individually check each of the sequence data of 761 samples and compare the chloroplate gene sequences to the herbarium specimens for species identification.
Another difficulty associated with simply reaching the point that the stacking work can be done is collecting all the samples. Author Jinpu Wei states: "We have collaborated with Ruili forestry experts to collect plant material distributed in the Ruili area to create a digital botanical garden. After a 45-day tiring effort, we collected 1,093 plant materials. Although it was a challenge for us to properly transport the materials, we finally managed to ensure the high quality of these plant materials for future research. "
The correspondent, Xin Liu, added that the project "is a major project to fine-tune and standardize the sample, methodologies and data accumulation and techniques to analyze large-scale genomic projects like 10KP (10,000 plant genome projects). From this project we have accumulated a significant and useful experience for the subsequent collection of samples, stacking and assembly. At the same time, the data obtained from this study can be effectively used in the next genomic projects. "
Although they designed only one sequencing library for each species, the authors were able to assemble pre-genomes for 17 of them, reflecting the quality and potential for DNA re-use. Researchers from the Chinese University of Hong Kong have already assembled the genomes of species that are of particular interest to them. The potential of the broader research community to study their interest, to improve other genomes, to develop tools and methods, and to provide education opportunities for new generations of scientists is enormous.
Leading author Juan Liu added that "the genomic characterization will provide a large amount of basic data for assembling the plant genome, which will be an excellent start for the 10KP project. At the same time, it provides a good basis for future studies of the correlation mechanism from macroscopic ecology and biodiversity to microscopic molecular level. "
In order to promote wider data sharing than just provide sequence data, researchers also make available digital images and provide access to the herbarium. The Herbarium (HCNGB) serves as a base of living plants that records the status of the species grown in the Ruili Botanical Garden and monitors the condition of each species.
All of the digital data generated here (images, raw sequence data, assembled chloroplast genomes and pre-assemble nuclear genome assemblies) are available through NSI, SRA, GigaScience GigaDB database and the National Bank of China at CNSA. In addition, to enable search and genomic data updates and species identification, metadata are indexed and linked via Datacite and GigaDB. And all resources are released without CC0 limitations. The author, Dr. Shunil Kumar Sahu, stressed that this is the most important legacy of the project. "This dataset is of great importance for plant scientists and, more importantly, it can serve as a starting point for future projects for a genetic sequence on a planetary scale, including the Earth BioGenome EBP project) and 10,000 plant genome (10KP) projects. "