Frederic Dumonceaux presented a part of his Ph.D work at BDA’2014 (journées Bases de Données Avancées), entitled Materializing Data Cubes as Partitions of Sets of Tuples.
Abstract : In the last decade, several partition-based algorithms were introduced to summarize data cubes in a multidimensional modeling context. Summarization process contains in itself many critical issues occurring when querying a data warehouse. One concrete example is the ability to group similar queries to set up a materialization/caching policy improving re- sponse time whenever the result of a query is broadly close from a precomputed result. Indeed, while the number of in- volved dimensions linearly increases, data complexity of the query then grows exponentially and stored some views sum- marizing tuples aggregates is likely to speed up the overall query. The cornerstone is therefore the ability to handle aggre- gates of cells as well as a dedicated algebra which reflects navigation within a cube and query processing using de- scriptors and filtering. Moreover, although lots of frame- work were proposed to efficiently manage and process OLAP queries in many different ways, none of those handle their results as first-class objects to be managed and queried. In this paper, we propose an algebra whose base objects are partitions defined over the set of facts gathered in a data warehouse. We will annotate them to leave a contextual in- formation to express at once the result of many multidimensional queries. We will also outline its benefits in the whole building and querying process.
In october 2013, we organised, with the GDD research group, the 29th BDA (Bases de Données Avancées) conference in Nantes, which is the main annual scientific meeting for the French research community in databases. The talks are available on video : keynotes, tutorials and standard sessions, through the conference website (most presentations are in French). Keynotes and tutorials include talks by Patrick Valduriez, Serge Abiteboul, Stratos Idreos, Stéphane Frénot and Stéphane Grumbach, on various aspects of “big data management” and personal data.
F.Dumonceaux presents his paper A First Attempt to Computing Generic Set Partitions: Delegation to an SQL Query Engine at the DEXA’2014 conference. Full Paper.
Abstract : Partitions are a very common and useful way of organiz- ing data, in data engineering and data mining. However, partitions cur- rently lack efficient and generic data management functionalities. This paper proposes advances in the understanding of this problem, as well as elements for solving it. We formulate the task as efficient processing, evaluating and optimizing queries over set partitions, in the setting of re- lational databases. We first demonstrate that there is no trivial relational modeling for managing collections of partitions. We formally motivate a relational encoding and show that one cannot express all the operators of the partition lattice and set-theoretic operations as queries of the re- lational algebra. We provide multiple evidence of the inefficiency of FO queries. Our experimental results enforce this evidence. We claim that there is a strong requirement for the design of a dedicated system to manage set partitions, or at least to supplement an existing data man- agement system, to which both data persistence and query processing could be delegated.
F.Dumonceaux presents part of his Ph.D. work “An algebraic approach to ensemble clustering” at Int. Conf. on Pattern Recognition (ICPR’2014) . Full paper.
Abstract—In clustering, consensus clustering aims at providing a single partition fitting a consensus from a set of independently generated. Common procedures, which are mainly statistical and graph-based, are recognized for their robustness and ability to scale-up. In this paper, we provide a complementary and original viewpoint over consensus clustering, by means of algebraic definitions which allow to ascertain the nature of available inferences in a systematic approach (e.g. a knowledge base). We found our approach on the lattice of partitions, for which we shall disclose how some operators can be added with the aim to express a formula representing the consensus. We show that adopting an incremental approach may assist to retain significant amount of aggregate data which fits well with the set of input clusterings. Beyond that ability to model formulae, we also note that its potential cannot be easily captured through such a logical system. It is due to the volatile nature of handling partitions which finally impacts on ability to draw some valuable conclusions.
We organized a scientific workshop in may 2012 in Nantes, on the – at that time – emerging question of the Open Data movement. The goal of the workshop was to progress on the following questions :
- What computer science research topics are renewed, and how, by the Open Data movement – new issues and new solutions ?
- What should computer science research learn from other fields of Open Data ?
The proceedings were published as open data on the Arxiv repository, available through the workshop website.