The idea behind our work in bioinformatics is to build on existing methodologies regarding large-scale data analysis and to develop novel algorithms for processing and merging complex biological data from multiple sources such as gene expression data, sequence information, protein-to-protein interaction data, clinico-pathological data etc. Our ultimate goals in this activity include (1) a better understanding of the molecular mechanisms governing cellular processes and especially those involved in certain kinds of diseases such as cancer and (2) the development of computational tools that can be used for diagnostic/prognostic purposes based on genomic data. Our research activities and ongoing projects in this area focus on theoretical and computational algorithms to address biological questions pertaining to.
Research in Gene Expression Data Processing
Gene transcription and translation products, mRNA molecules and proteins are the main factors which define in each moment the status of the cell, the tissue and generally the status of the whole organism, through complex interactive networks. Technologies that allow the monitoring of gene expression levels for large numbers of genes such as microarrays and deep sequencing have led to an explosion of available data, on a genome-wide scale. However the transformation of this information to biological information which can be used for biological or medical inferences is a challenging task. Microarrays for example provide a huge amount of noisy data. The utilization of these data requires a combination of biological knowledge, statistics, machine learning, and the development of efficient algorithms that are able to select the useful features.
Our lab work in this area focuses on the development of novel feature extraction and clustering/classification algorithms which can be used for the discrimination of different groups of samples, e.g. normal vs disease samples, different disease samples or different subgroups of a disease. Furthermore, selected features can serve as molecular markers for the prediction of the outcome of a disease and for the identification of unknown processes that are involved in the generation and progression of a disease.
Implemented methodologies include information theoretic approaches, multidimensional scaling, signal-to-noise statistical and neural network methods as well as ad hoc algorithms. We also develop new and modify existing clustering (k-NN, hierarchical, SOM) and classification (SVM, Bayesian networks, ANN) algorithms for the categorization of expression data. We aim to study the complexity and performance of these algorithms on gene expression data as functions of the type of input data, size of input set, degree of correlation between input features, number of categories, signal-to-noise ratio in the data etc.
Research in Recognition and Functional Categorization of miRNAs
MicroRNAs (miRNAs) belong to a recently identified group of the large family of non-coding RNAs. The mature miRNA is usually 19–27 nucleotides long and is derived from a larger precursor that folds into an imperfect stem-loop structure. The mode of action of the mature miRNA in mammalian systems is dependent on complementary base pairing primarily to the 3’UTR region of the target mRNA, thereafter causing the inhibition of translation and/or the degradation of the mRNA. According to recent estimates, while over 30% of vertebrate genomes is transcribed (2),only 1% consists of coding genes, suggesting that the rest must be various types of non-coding RNA genes.
Our lab work in the miRNA field focuses on the development of computational methods and tools for (1) the identification of novel miRNA genes, (2) the identification of the mature part of the miRNA and (3) the accurate prediction of miRNA targets. Towards this goal we combine computational with experimental approches in collaboration with Kriton Kalantidis at IMBB-FORTH and the Department of Biology, University of Crete.
- Classification of Astrocytic Tumours into their Malignancy Grades using Neural Networks
- Identification of Informative Genes for Class Prediction in Cancer
- Expression Profiling for Breast Cancer Incidence – The Prognochip project
Inference and Modelling of Biological Networks
Our lab work in this field focuses on the development and application of computational methods and tools for the identification and quantitative modelling of biological networks. We are particularly interested in the identification of interactive networks between genes and their regulators, which determine the concerted action of different mRNAs or proteins, and eventually the status of the cell. Genes do not generally act independently, but their action has to be considered inside a framework of interactive modules, that form a complex network. The identification of those modules and the comprehension of the affinities between modules is a priority for the study of every disease, or generally for a better understanding of cellular functions.