The HADAS research project is guided by new challenge introduced by the continuous production of huge, distributed and heterogeneous data that require data technologies to support several activities such as capturing, integrating, searching/querying, filtering, indexing, recording, preserving, annotating, etc … We propose to contribute to the development of new largely distributed, scalable, adaptive and intelligent data and knowledge management infrastructures that include theses technologies. During the next period, we will focus on:
- Management of massive datasets particularly focusing on:
- Adaptive and distributed storage and cache for storing large heterogeneous datasets.
- Indexing data on the fly to facilitate efficient data manipulation.
- Economy and energy oriented integration of big datasets management: economic cost model.
- Quality-based continuous data/event stream processing and composition.
- Adaptive querying systems:
- Declarative hybrid languages for expressing data (streams) processing.
- Learning-based distributed query optimization for efficient (continuous) query evaluation with scarce metadata.
- Query operators for on-the-fly data reorganization facilitating future data manipulations.
- Service Level Agreement guided optimization of continuous and mobile queries.
Of course these data technologies have to be largely distributed and deployed over different types of architectures (grids, peer-to-peer networks, sensor networks, cloud infrastructure). We will adopt a service-based approach and develop new data models, algorithms and services that will fulfill properties such as efficiency, adaptivity, reliability, robustness, security, confidentiality, and privacy. This vision is nowadays well accepted as we have to consider large and heterogeneous data sets, huge numbers of connected devices with data management capabilities and increasing numbers of users/applications. In suchavision, well-adaptedfortheinternetofobjects, securityofdataandserviceisabigissue, especially data provenance that could be managed using physical tagging systems.
Sustainable mobility and urban systems like smart cities, energy, clean, safe and efficient technologies like Smart Grids, smart energy, clean technologies and data markets for extracting business value from data, are examples of applications that call for an intelligent, adaptive, efficient and scalable data and knowledge management infrastructure. The smart grids domain we choose to explore is very promising as the management of data (where to put the data, what to do with it, which data to collect, integrate, summarize, how to access it efficiently, … ) is the foundation for developing intelligent metering systems and adaptive supervisory control able to handle huge amount of events and alerts. We will also test the reliability and robustness of our data technologies when collecting healthcare data.
Our agenda falls within the scientific research directions on Information and Communication Technologies given by the H2020 program. We identified two of them for the group: (i) Advanced Cloud Infrastructures and Services, and (ii) Big Data Innovation and take-up and Big Data. It also concerns the societal chalenges: Secure, clean and efficient energy.