A key feature of a database management system (DBMS) is its ability to process declarative queries. Users pose queries to the database system which specify what the desired query result should be, not how to compute this result. For any user query, there are typically many different query execution plans consisting of different operators combined in different orders. The query optimizer of a database system must choose the most efficient query execution plan by enumerating candidate plans and estimating their cost using a cost model, which relies on statistics that capture the different properties of the data in the database.
Emerging applications that require database support (e.g., multimedia systems, Web databases) incorporate notions of proximity, ranking, and user feedback that have serious impact on the query model, which cannot be adequately handled by current query processing approaches embodied in traditional DBMSs. Our work focuses on enriching query processors/optimizers to handle Information Retrieval (IR) style queries that are common in these applications. These queries do not have exact answers, requiring the ranking of the results according to their suitability as an answer. Some examples of these types of queries are the following.
Consider a video database that contains several extracted video features, e.g., color histograms, texture, and edge orientation. These features are extracted for each video frame and are stored in separate tables. Each feature is indexed using a high-dimensional index for a faster query response. An example query is one that retrieves the top- k frames most similar to a given image based on both color and texture. Alternatively, consider a family interested in buying a house with a nearby school with the objective of minimizing the total cost. For simplicity, consider a cost function that sums the price of the house and 5-year school tuition. Searching against two web databases, HOUSES and SCHOOLS, the family may wish to find the top 10 houses and schools, ranked in decreasing combination of price of the house and the school tuition such that the distance between the house and the school is less than some value.
Since, in many emerging applications such as the above, "ranking" the query results according to some user criteria is an integral part of the query semantics, it is essential to efficiently support ranking by database systems. Furthermore, as more of these applications migrate to hand-held devices, sensor network and the Internet, query processing paradigms have to adopt to the continuous changes and fluctuations in the computing environments that are typical of these platforms. Consequently, it is necessary to provide adaptive processing mechanisms for ranking queries.
The goal of this line of research is to develop a formal and generalized ranked-retrieval framework that adapts to non-traditional processing environments and is applicable to a variety of data models. We are primarily interested in developing algorithms and frameworks to leverage the current capabilities of database systems to handle ranked retrieval efficiently and adaptively. Current research is along three lines: (1) enabling ranked retrieval in current relational query processors as a first-class functionality; (2) extending rank-aware query processing beyond relational data models to text and XML databases, which are becoming an integral part of current applications infrastructure; and (3) introducing new rank-aware query processing paradigms that offer a high degree of adaptability and scalability.
We have proposed several practical query operators with various physical implementations to be integrated in relational query processors ([1], [2] , [3]). We have studied developing novel algebraic frameworks and cost estimation models to fully integrate the proposed operators and algorithm in real-world database engines ([4] , [5]). We developed a prototype, RankSQL, based on PostgreSQL that integrate all the proposed algorithms and frameworks, RankSQL has bee demonstrated at the 2005 International Conference on Very Large Databases [6].

Database Research Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208
Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Database Research Group