User Tools

Site Tools


public_namespace:presentation

Gwendia research goals

GWENDIA: Grid Workflow Efficient Enactment for Data Intensive Applications

Flow management is a very active research area which received special intention from the distributed computing community over the last years. In many scientific areas, such as the application areas considered in this project, complex data processing procedures are needed to analyse huge amounts of data. GWENDIA aims at providing efficient workflow management systems to handle and process large amounts of scientific data on large scale distributed infrastructures such as grids. This is a multi-disciplinary project which gathers researchers in computer science (distributed systems, scheduling) and researchers in the life sciences community (medical image analysis, drug discovery). The project objectives are twofold. In computer science, GWENDIA aims at efficiently exploiting distributed infrastructures to deal with the huge and still increasing amount of scientific data acquired in radiology and biology centres. In particular, we will focus on the representation and the management of large data flows in acceptable time for the operators using distributed resources. In life sciences area, GWENDIA aims at dealing with distributed, heterogeneous, and evolving large scale databases, to represent complex data analysis procedures taking into account the medical or biological context, and to exploit CS tools to design at a low cost scientifically challengig experiments with a real impact for the community. This study will be based on two very large scale grid infrastructures: the Grid'5000 French national research infrastructure and the EGEE European production infrastructure.

GWENDIA will provide a workflow description framework including data composition operators useful for describing the applications data flows. It includes the design of workflow scheduling algorithms optimized for efficiently distributing the computation loads over a grid infrastructure, taking into account the data constraints. The scheduling strategies developed will be implemented, reusing existing software components such as the DIET middleware and the MOTEUR workflow manager. This research will be guided by the requirements of two application areas in life sciences: medical image analysis and in silico drug discovery. Concrete usecases will be implemented and deployed on grid infrastructure in both areas. The GWENDIA project aim at enabling scientific production in both areas, providing transparent access to grid infrastructures for coherently and efficiently processing these data-intensive applications.

This research project is not directly involving industries. Yet, workflow management has been a very active area for industry over the past year and with the industry uptake in grid technologies, there will probably be a significant interest from industry for grid-enabled workflow managers. In particular, INRIA/GRAAL is collaborating with IBM which is one of the major developer of the BPEL workflow language. The two application areas considered also have concrete social and industrial benefits. Automated medical imaging analysis is increasingly needed in clinics and in silico drug discovery is likely to have a huge economical impact, raising a high interest in pharmaceutics industry.

public_namespace/presentation.txt · Last modified: 2011/05/19 11:29 (external edit)