OUR ONGOING PROJECTS & APPLICATIONS
Viruses are everywhere and have shaped the world of day ,as they have been arround much longer than humans. They are very basic in organization and can go through major genetic changes caused by mutations and recombination. They lack prrofreading mechanisms and that makes them more prone to mutations thus driving viral evolution. Viruses have tremendous impact in many major fields of science and biotechnology. Namely, other than the obvious medical implications, they affect livestock, agriculture and even the climate. Worldwide, billions are invested in pesticides and antivirals that affect production biotech. To date though there is no antiviral drug nor vaccination agent available. In this project the students will have the oportunity to work in the lab and express the recombinant enzymes of Dengue helicase and Dengue polymerase. The proteins will be used for assaying to evaluate new inhibitor compounds via our network of collaborators in the UK and US. Bioinformatics will be used to design new algorithms that will allow the genetic analysis of viruses via the trojan horse of 3D structure . It is remarkable of how viral strains of the same family only share sequence identity in the range of +/- 20%. Traditional sequence alingments are of no use here. One of the major drawbacks of modern bioinformatics is the fact that protein similarity and blast searches are still based on only the primary amino acid sequence. Primary sequence searches are inadequate, as they fail to provide a realistic fingerprint for the query protein. Protein function is much more related to protein structure and its physicochemical properties rather than its amino acid sequence. After all protein structure is much more conserved than sequence in nature. In this direction and in an effort to bridge this flaw, a novel platform will be developed, which is capable of performing fast similarity searches using both secondary structural and physicochemical information. Our tool will use secondary structural information from the PDB database when available. If the query protein is not indexed in the RCSB PDB database, our tool will automatically determine the secondary elements of the given protein by performing ‘on the fly’ secondary structure prediction. All physicochemical information will have been stored in custom designed, dedicated databases. All query proteins are then blasted against the RCSB PDB secondary elements and physicochemical databases. Hits will be scored, ranked and returned to the user via a well-organised and user friendly graphical user interface.
Flaviviridae is a family of viruses that infect vertebrates. Distinct viral structures of this family are visible in thin sections of infected tissue. The size of virion has been estimated by filtration. Virions of the flaviviridae family are enveloped and slightly pleomorphic during their life cycle. They are spherical in shape and usually 40-60 nm in diameter. Their nucleocapsids are isometric and sometimes penetrated by stain. The usual size of the nucleocapsids is 25-30 nm in diameter and they have polyhedral symmetry. Virions of the flaviviridae family contain one molecule of linear positive-sense single stranded RNA. The total genome length is 9500-12500 nt. The 5' end of the genome has a cap, or a genome-linked protein (VPg). The 3' end regularly has no poly (A) tract (except some strains of tick-borne encephalitis complex of flaviviruses, which have a poly (A) tract). Their nucleic acid material is fully encapsidated and solely genomic. The genome of flaviviridae features a 5' end that encodes structural proteins, whereas the non-structural proteins including protease, helicase and polymerase, are encoded at the 3' end. To date neither specific antiviral treatments exist nor are there any vaccines available for either infection. Thus there is an urgent need for new therapies. Herein, an effort will be made to shed light to the genetic, evolutionary and structural features of the viral helicase enzymes towards the establishment of a versatile drug design platform of potent antiviral agents. The proposed project will involve full phylogenetic and biostatistical analysis of viral genomes and comparative/homology modeling of helicase enzymes. Eventually a drug design platform will be established and a series of drug-like inhibitors of the viral helicase enzyme will be in silico scored and evaluated.
Molecular Evolution is the study of evolution looking at the DNA, the RNA and the proteins. It was in the 60’s that researchers from the molecular biology, evolutionary biology and population genetics fields; by taking advantage of the available data on the structure and function of nucleic acids and proteins tried to approach and understand evolutionary based questions. Recent advances in genomics including whole-genome sequencing, in proteomics including the high-throughput protein characterization and in bioinformatics including the storage and the analysis of the vast amount of data gave a real push to the studies in this field. In this project the students will investigate similarities and differences between the orthologous genes i.e. the evolutionary conserved genes in different species . It is well known that the coding region of the genes is generally the most conserved area, therefore we begin our exploration from this region. This analysis gives important information on how well conserved a specific gene is across evolutionary history. Regulation of the orthologous genes is something that interests us as well. The assumption being that if two genes are regulated by the same or equivalent mechanisms they should share conservation at their untranslated areas. Hence, we study the untranslated regions of the orthologous genes. This gives important information on the potential regulatory sites and guides further experimental work. To have a bigger part of the picture in mind, we study the protein-protein interactions of the orhtologous sequences and their conservation or lack of it. This piece of information shows the stronger i.e. the most conserved as opposed to the weaker i.e. the less conserved interactions.
Although many tools designed for structural variation analysis there are no guidelines for bench scientists to help them choose the best tool for their particular data set. What is needed is a tool which can identify the type and quality of the sequencing reads, assemble them with regards to a reference genome, identify SNPs, short indels and large indels, and determine if any annotated genes are associated with the large indels. Lastly, it is desirable that the tool should have a user-friendly interface with which bench scientists can easily examine structural variants in their accession(s) of interest that need to be validated. The long term goal would be to scale up this process to analyse multiple genomes at a time (for example, multiple time points, treatments, or related individuals). Lastly, the tools will be ranked on the user-friendly-ness of their interface. Some of the Tools (list not limited though) that the students are going to benchmark are: Delly, InGap, BreakDancer, GenomeSTRIP, Tigra-SV, GASV, Hydra, SVMerge, SVDetect, CNVSeq, CNVnator and Pindel
Statistical analysis of genomic data. Given the rapid advances in genomics and bioinformatics that have taken place in the past few years, there is a growing need for analysing vast amount of data and interpreting their results. Large-scale cancer genome studies have successfully applied some preliminary integration approaches. To this end, we aim to analyse multi-source data using association statistics to estimate pair wise as well as group dependencies in the data. Multivariate analysis and likelihood-based inference will be also employed to estimate data patterns. The ultimate goal is to establish a methodology to integrate different data which measure multiple genomic features and discover or validate findings that would not be discovered by analysing each data independently. Good programming skills are a pre-requisite (e.g. R statistical software, C/C++).
To make an analogy with genomics, the term proteomics emerged which describes the high-throughput protein analysis. Different cell types, different developmental stages, different environmental conditions give rise to different proteomes. Proteomics is instrumental in the discovery of biomarkers, that is proteins that indicate a particular disease. In this project the student will develop new algorithms/software for the proteomics scientific community by analyzing our own in-house data.
The aim is ambitious: delaying frailty by developing a set of measures and tools, together with recommendations to reduce its onset. To achieve these objectives, FrailSafe will combine state of the art information technologies and data mining techniques with high-level expertise in the field of health and ageing. The project lasts 3 years and is funded by the European Research programme Horizon 2020. Our collaborator from the University of Patras, Costas, is demonstrating one of the many capabilities of the FrailSafe smart vest (i.e. to accurately detect a fall incident):
FINAL YEAR STUDENTS INTERESTED IN RESEARCH PROJECTS WITHIN THE GROUP, PLEASE CONTACT US
Bioinformatics Software developed by our group
Drugster is a de novo Drug Design platform, which through the versatility of the elite software that it incorporates can efficiently exploit single or multiple processor workstations and achieve high performance through novel and faster custom-made routines. Drugster is a freeware platform aimed to assist scientists in the field of Computer Aided Drug Design (CADD). It facilitates the use of other freeware applications (PDB2PQR, Gromacs, Ligbuilder, Dock) in order to create a pipeline for producing high quality results.
During the past few years, pharmacophore modeling has become one of the key components in computer-aided drug design and in modern drug discovery. DrugOn is a fully interactive pipeline designed to exploit the advantages of modern programming and overcome the command line barrier with two friendly environments for the user (either novice or experienced in the field of Computer Aided Drug Design) to perform pharmacophore modeling through an efficient combination of the PharmACOphore, Gromacs, Ligbuilder and PDB2PQR suites.
Drugena suite is a pioneering platform that employs state of the art computational biology methods in the fight against neurodegenerative diseases using ADCs. Drugena encompasses an up-to-date structural database of specialized antibodies for neurological disorders and the NCI database with over 96 million entities for the in silico development of ADCs. The pipeline of the Drugena suite has been divided into several steps and modules that are closely related in a synergistic fashion under a user friendly graphical interface.
Antisoma application has been developed in order to provide the information found with this new approach and concerning the antibodies of the species. It provides information about both the filtered and the unfiltered data in a fast and clearly legible way and it is intended to maintain a complete and proper database of immunoglobulins and the information concerning them and also help the relevant scientists in their work. Antisoma application has been developed, in the MATLAB environment. Antisoma elaborates a custom made database, which via a user-friendly interface interacts with the user and provides all the information needed.
With the extensive use of microarray technology as a potential prognostic and diagnostic tool, the comparison and reproducibility of results obtained from the use of different platforms is of interest. The integration of those datasets can yield more informative results corresponding to numerous datasets and microarray platforms. We developed a novel integration technique for microarray gene-expression data derived by different studies for the purpose of a two-way Bayesian partition modelling which estimates co-expression profiles under subsets of genes and between biological samples or experimental conditions. The suggested methodology transforms disparate gene-expression data on a common probability scale to obtain inter-study-validated gene signatures.
Protein structure is more conserved than sequence in nature. In this direction we developed a novel methodology that significantly improves conventional homology modelling when sequence identity is low, by taking into consideration 3D structural features of the template, such as size and shape. Herein, our new homology modelling approach was applied to the homology modelling of the RNA-dependent RNA polymerase (RdRp) of dengue (type II) virus. The RdRp of dengue was chosen due to the low sequence similarity shared between the dengue virus polymerase and the available templates, while purposely avoiding to use the actual X-ray structure that is available for the dengue RdRp.
he term 'molecular cartography' encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors. Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces.
An extensive effort is made by The Gene Ontology Consortium in order to gather all the protein – function pairs in a standard format and produce a well structured vocabulary that would present all the known biological functions in a hierarchical, controlled structure. Taggo takes advantage of the Gene Ontology (GO) to extract the proteins’ main attributes. It combines the potential of discarding annotations that are supported by not so reliable Ecs, it is an extremely fast process, it offers the convenience of searching the ten most general categories of each term.
Liquid Chromatography-Mass Spectrometry (LC-MS) is a commonly used method to detect protein-protein interactions and elucidate complex protein mixtures. Visualization of large data sets produced from LC-MS, specifically the chromatograph and the mass spectra that correspond to its peaks is the focus of this work. The Brukin2d software, developed with Matlab 7.4, uses the compound data that are exported from Bruker 'DataAnalysis' program, and depicts the mean mass spectra of all the chromatograph compounds from one LC-MS run, in one 2D contour plot. Each spot in the plot represents one peptide mass.
GIBA is an effective and easy to use tool for the detection of protein complexes through clustering on protein - protein interaction networks. It was proved with extensive experiments that GIBA produces more quality approximations of protein complexes than other methods.
The algorithm is a hierarchical one and performs successive min –cuts until it identifies dense subgraphs. The stopping criterion of the AHC depends on the initial graph density and it is adjusted to each case accordingly. That means that when the input data forms a dense protein interaction network, the stopping criterion of the AHC is stricter and leads to the selection of more dense subgraphs as protein complexes candidates.
GAppi performs clustering in protein-protein interaction networks to identify protein complexes. The algorithm has been tested exhaustively with experimental datasets coming from online protein interaction databases and individual experiments and has been compared with five other effective techniques in order to demonstrate its efficiency and superior performance. Results showed that GAppi produces feasible and very efficient solutions compared to other techniques.
GOmir (by using up to four different databases) introduces, for the first time, miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir module, JTarget, integrates microRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools and also providing a full gene description and functional analysis for each target gene.
FED & SAFE Suites
The tools that perform this analysis are: 1. Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and 2. Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events. Fusion analysis has been used to identify putative protein-protein interactions in completely sequenced genomes of various prokaryotes, and eukaryotes.
A suite of computer programs has been developed under the general name Thetis, for monitoring structural changes during molecular dynamics (MD) simulations on proteins. Conformational analysis includes estimation of structural similarities during the simulation and analysis of the secondary structure with emphasis on helices. In contrast to available freeware dealing with MD snapshots, Thetis can be used on a series of consecutive MD structures thus allowing a detailed conformational analysis over the time course of the simulation.