Final Year Projects:

Students interested for research projects please contact me


Viruses are everywhere and have shaped the world of day ,as they have been arround much longer than humans. They are very basic in organization and can go through major genetic changes caused by mutations and recombination. They lack prrofreading mechanisms and that makes them more prone to mutations thus driving viral evolution. Viruses have tremendous impact in many major fields of science and biotechnology. Namely, other than the obvious medical implications, they affect livestock, agriculture and even the climate. Worldwide, billions are invested in pesticides and antivirals that affect production biotech. To date though there is no antiviral drug nor vaccination agent available. In this project the students will have the oportunity to work in the lab and express the recombinant enzymes of Dengue helicase and Dengue polymerase. The proteins will be used for assaying to evaluate new inhibitor compounds via our network of collaborators in the UK and US. Bioinformatics will be used to design new algorithms that will allow the genetic analysis of viruses via the trojan horse of 3D structure . It is remarkable of how viral strains of the same family only share sequence identity in the range of +/- 20%. Traditional sequence alingments are of no use here.
One of the major drawbacks of modern bioinformatics is the fact that protein similarity and blast searches are still based on only the primary amino acid sequence. Primary sequence searches are inadequate, as they fail to provide a realistic fingerprint for the query protein. Protein function is much more related to protein structure and its physicochemical properties rather than its amino acid sequence. After all protein structure is much more conserved than sequence in nature. In this direction and in an effort to bridge this flaw, a novel platform will be developed, which is capable of performing fast similarity searches using both secondary structural and physicochemical information. Our tool will use secondary structural information from the PDB database when available. If the query protein is not indexed in the RCSB PDB database, our tool will automatically determine the secondary elements of the given protein by performing ‘on the fly’ secondary structure prediction. All physicochemical information will have been stored in custom designed, dedicated databases. All query proteins are then blasted against the RCSB PDB secondary elements and physicochemical databases. Hits will be scored, ranked and returned to the user via a well-organised and user friendly graphical user interface.


Flaviviridae is a family of viruses that infect vertebrates. Distinct viral structures of this family are visible in thin sections of infected tissue. The size of virion has been estimated by filtration. Virions of the flaviviridae family are enveloped and slightly pleomorphic during their life cycle. They are spherical in shape and usually 40-60 nm in diameter. Their nucleocapsids are isometric and sometimes penetrated by stain. The usual size of the nucleocapsids is 25-30 nm in diameter and they have polyhedral symmetry.
Virions of the flaviviridae family contain one molecule of linear positive-sense single stranded RNA. The total genome length is 9500-12500 nt. The 5' end of the genome has a cap, or a genome-linked protein (VPg). The 3' end regularly has no poly (A) tract (except some strains of tick-borne encephalitis complex of flaviviruses, which have a poly (A) tract). Their nucleic acid material is fully encapsidated and solely genomic. The genome of flaviviridae features a 5' end that encodes structural proteins, whereas the non-structural proteins including protease, helicase and polymerase, are encoded at the 3' end. To date neither specific antiviral treatments exist nor are there any vaccines available for either infection. Thus there is an urgent need for new therapies.
Herein, an effort will be made to shed light to the genetic, evolutionary and structural features of the viral helicase enzymes towards the establishment of a versatile drug design platform of potent antiviral agents. The proposed project will involve full phylogenetic and biostatistical analysis of viral genomes and comparative/homology modeling of helicase enzymes. Eventually a drug design platform will be established and a series of drug-like inhibitors of the viral helicase enzyme will be in silico scored and evaluated.


Molecular Evolution is the study of evolution looking at the DNA, the RNA and the proteins. It was in the 60’s that researchers from the molecular biology, evolutionary biology and population genetics fields; by taking advantage of the available data on the structure and function of nucleic acids and proteins tried to approach and understand evolutionary based questions. Recent advances in genomics including whole-genome sequencing, in proteomics including the high-throughput protein characterization and in bioinformatics including the storage and the analysis of the vast amount of data gave a real push to the studies in this field. In this project the students will investigate similarities and differences between the orthologous genes i.e. the evolutionary conserved genes in different species . It is well known that the coding region of the genes is generally the most conserved area, therefore we begin our exploration from this region. This analysis gives important information on how well conserved a specific gene is across evolutionary history. Regulation of the orthologous genes is something that interests us as well. The assumption being that if two genes are regulated by the same or equivalent mechanisms they should share conservation at their untranslated areas. Hence, we study the untranslated regions of the orthologous genes. This gives important information on the potential regulatory sites and guides further experimental work. To have a bigger part of the picture in mind, we study the protein-protein interactions of the orhtologous sequences and their conservation or lack of it. This piece of information shows the stronger i.e. the most conserved as opposed to the weaker i.e. the less conserved interactions.


Although many tools designed for structural variation analysis there are no guidelines for bench scientists to help them choose the best tool for their particular data set. What is needed is a tool which can identify the type and quality of the sequencing reads, assemble them with regards to a reference genome, identify SNPs, short indels and large indels, and determine if any annotated genes are associated with the large indels. Lastly, it is desirable that the tool should have a user-friendly interface with which bench scientists can easily examine structural variants in their accession(s) of interest that need to be validated. The long term goal would be to scale up this process to analyse multiple genomes at a time (for example, multiple time points, treatments, or related individuals). Lastly, the tools will be ranked on the user-friendly-ness of their interface.
Some of the Tools (list not limited though) that the students are going to benchmark are: Delly, InGap, BreakDancer, GenomeSTRIP, Tigra-SV, GASV, Hydra, SVMerge, SVDetect, CNVSeq, CNVnator and Pindel


Statistical analysis of genomic data. Given the rapid advances in genomics and bioinformatics that have taken place in the past few years, there is a growing need for analysing vast amount of data and interpreting their results. Large-scale cancer genome studies have successfully applied some preliminary integration approaches. To this end, we aim to analyse multi-source data using association statistics to estimate pair wise as well as group dependencies in the data. Multivariate analysis and likelihood-based inference will be also employed to estimate data patterns. The ultimate goal is to establish a methodology to integrate different data which measure multiple genomic features and discover or validate findings that would not be discovered by analysing each data independently. Good programming skills are a pre-requisite (e.g. R statistical software, C/C++).


To make an analogy with genomics, the term proteomics emerged which describes the high-throughput protein analysis. Different cell types, different developmental stages, different environmental conditions give rise to different proteomes. Proteomics is instrumental in the discovery of biomarkers, that is proteins that indicate a particular disease. In this project the student will develop new algorithms/software for the proteomics scientific community by analyzing our own in-house data.