If you used my computational methods in your research, I would be thrilled if you cited me in your article.
Video will be posted here soon!
Emergence of next generation sequencing has revealed significant levels of transcriptional activity within both unannotated and annotated regions of the genome, leading to construction of novel tramscripts. These novel transcripts may be located in the genic regions such as antisense, overlapped intronic, and overlapped exonic or may be located in the intergenic regions. However, they can be coding or noncoding in the broader aspect. Hence, one of the main tasks is to functionally characterize these novel transcripts and to determine if they are coding or noncoding (ncRNA). Although the functions of coding genes have been studied for many years, trends to characterize the function of noncoding transcripts have been started recently. Several evidences for implication of ncRNAs in control of development, growth, and disease have been reported so far. ncRNAs can perform their functions through different mechanisms such as chromatin modifications (epigenetic control of gene expression), RNA-protein interactions, and transcriptional inference. We propose a support vector machine classifier, which can classify novel transcripts into coding or noncoding with over 98% accuracy. Several sequential and structural features have been compiled for training the classifier. The classifier has been used to classify the novel assembled transcripts from RNA-sequencing pipelines for Soybean and Arabidopsis organisms. However, it can be adapted to any species.
Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination, in a process that has been intensively studied at the biochemical, but not yet at the genomic level. Seed maturation also involves highly regulated processes that are only partially understood. RNA sequencing was used to provide comprehensive information concerning transcriptional and post-transcriptional events that take place in developing soybean embryos. Distinct classes of alternatively spliced isoforms were detected and corresponding changes in their levels on a global scale during soybean embryo development were distinguished, using bioinformatics tools. Novel and known splice variants (SV) involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself were identified. The SVs were analyzed in further detail for their coding potential, conservation, and their protein domains using machine-learning tools. Coding and noncoding SVs were detected, including transcripts where alterations in individual domains had occurred over the time-course of embryo development. Changes in subcellular localization of the resulting proteins, protein-protein, enzyme-substrate interactions, and/or regulation of protein activities in developing oilseed embryos may occur as a result of these alternative splicing activities.
Background
Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus.Results
We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96% accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13% was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean.Conclusions
This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination. The accumulation of these storage compounds in developing seeds is highly regulated at multiple levels, including at transcriptional and post-transcriptional regulation. RNA sequencing was used to provide comprehensive information about transcriptional and post-transcriptional events that take place in developing soybean embryos. Bioinformatics analyses lead to the identification of different classes of alternatively spliced isoforms and corresponding changes in their levels on a global scale during soybean embryo development. Alternative splicing was associated with transcripts involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself. Detailed examination of selected RNA isoforms revealed alterations in individual domains that could result in changes in subcellular localization of the resulting proteins, protein-protein and enzyme-substrate interactions, and regulation of protein activities. Different isoforms may play an important role in regulating developmental and metabolic processes occurring at different stages in developing oilseed embryos.
Soybean (Glycine max) seeds are an important source of seed storage compounds, including protein, oil, and sugar used for food, feed, chemical, and biofuel production. We assessed detailed temporal transcriptional and metabolic changes in developing soybean embryos to gain a systems biology view of developmental and metabolic changes and to identify potential targets for metabolic engineering. Two major developmental and metabolic transitions were captured enabling identification of potential metabolic engineering targets specific to seed filling and to desiccation. The first transition involved a switch between different types of metabolism in dividing and elongating cells. The second transition involved the onset of maturation and desiccation tolerance during seed filling and a switch from photoheterotrophic to heterotrophic metabolism. Clustering analyses of metabolite and transcript data revealed clusters of functionally related metabolites and transcripts active in these different developmental and metabolic programs. The gene clusters provide a resource to generate predictions about the associations and interactions of unknown regulators with their targets based on “guilt-by-association” relationships. The inferred regulators also represent potential targets for future metabolic engineering of relevant pathways and steps in central carbon and nitrogen metabolism in soybean embryos and drought and desiccation tolerance in plants.
Multi-class pattern classification has a variety of applications and could be achieved using artificial neural networks (ANN). There are two major system architectures for using ANNs in multi-class pattern classification: using a single ANN and using multiple ANNs. Independent of what architecture is used, one of the main concerns of using ANNs is that with increasing number of pattern classes and training datasets, the training time will increase dramatically which renders the ANN unfeasible. In this paper, the vast computational power of Graphics Processing Units (GPUs) is utilized to mitigate this problem. Different architectures and different methods of feeding pattern classes are implemented in a GPU platform. Different methods have been proposed to achieve maximum parallelism and subsequently maximize throughput. Our implementation exceeds the state-of-the-art in literature in terms of speed and the accurate use of GPU resources. As a result, the proposed approach's run time is about 75% shorter than the previous approaches. In multi-ANN architecture, due to the inherent parallelism in the proposed implementation, the execution time of a system for a digit recognition application is reduced from seven hours in CPU to about 4 seconds in GPU.
In recent years, parameter variations present critical challenges for manufacturability and yield on integrated circuits. In this paper, a new method for improving the timing yield of field programmable gate array (FPGA) devices affected by random and systematic within-die variation is proposed. By selection of an appropriate configuration from a set of functionally equivalent configurations average critical path delay is reduced under conditions of large random and systematic variation considering spatial correlation. Compared to the previous approach which is limited to a fixed placement, our method improves timing yield by attempting several placements and routings without lengthy placement and routing phases to handle systematic variations and spatial correlation. The average critical path delay is reduced by 7% compared to the previous work over 20 MCNC benchmarks.