FONDECYT Project
Project: Effective and Efficient retrieval in multimedia databases.
Responsible: Benjamin Bustos.
Similarity search in multimedia database systems is becoming increasingly important, due to a rapidly growing amount of available multimedia data like images, audio files, video clips, 3D objects, time series, and text documents. As we see progress in the fields of acquisition, storage, and dissemination of various multimedia formats, the application of effective and efficient database management systems becomes indispensable in order to handle these formats. Application domains for multimedia databases include molecular biology, medicine, geographical information systems, Computer Aided Design/Computer Aided Manufacturing (CAD/CAM), and virtual reality.
Many of these practical applications have in common that the objects of the database are modeled in a metric space, i.e., it is possible to define a positive real-valued function among the objects, called metric, that satisfies the properties of strict positiveness, symmetry, and the triangle inequality. The main motivations to model multimedia databases as metric spaces are: the modeling is usually fast and easily parameterizable; there are many metric functions that can be efficiently computed; they are easily indexable by metric access methods.
A recent proposal to improve the effectiveness (i.e., the quality of the retrieved answer) of similarity search resorts to the use of combinations of metrics. Instead of using a single metric to compare two objects, the search system uses a linear combination of metrics (also known as multi-metric) to compute the (dis)similarity between two objects. This novel framework for searching on multimedia databases has shown to provide considerable improvements in the effectiveness of the search. However, there are still many research topics within this framework that must be addressed to reach its full potential.
This research project proposes to study several improvements to the multi-metric framework. Our goal is to produce several novel algorithms and methods aimed to improve the effectiveness as well as the efficiency of similarity search in multimedia databases based on the multi-metric approach. The main ideas that we plan to pursue are: new methods for computing the combinations of metrics, several new index structures for supporting multi-metric spaces, a novel approach that uses sets of indices for improving the efficiency of the search in multi-metric spaces, and new algorithms for similarity search with improved space cost. Additionally, we plan to study other novel trends for searching in multimedia databases.
To achieve our goals, we plan to develop and implement several novel algorithms and data structures, and then to perform exhaustive experimental evaluations of the proposed methods compared with the state-of-art techniques, both in terms of their efficiency (CPU or I/O time) as well as their effectiveness (using tools that come from the information retrieval community). With these extensive evaluations, we expect to assess the real gains provided by the new techniques for searching in multimedia databases.
We divide our proposed ideas in two main areas: those oriented to improve the effectiveness of the search, and those oriented to improve the efficiency of the search.
Improving the effectiveness of multimedia databases is related to:
- Optimal size for training set used to compute the weights.
- New methods for computing the query-dependent weights for multi-metric spaces.
Whereas, improving the efficiency of multimedia databases requires depth analysis in:
- Set of indices for multi-metric spaces with binary weights.
- A pivot-based version of M3-tree.
- Index for multi-metric spaces based on a Voronoi partition.
- Distance cache for general metric access methods.
- k-NN search algorithm with improved space cost.
- Indexing techniques for non-metric spaces.
As a result of this research project, we expect to make further advances in the theoretical foundations of multimedia databases, producing several scientific publications to present our results and producing implementations of our developed techniques. The methods that we plan to research are general and can be used with any particular multimedia database. That is, we are not restricting ourselves to propose solutions for some specific multimedia data type, but our results could be applied to any kind of multimedia database.
STUDENTS
- Sebastian Kreft
- Jaime Veliz
- Victor Sepulveda

