UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources

Markku Laine, Jari Kleimola, Petri Vuorimaa
Department of Computer Science, Aalto University, Finland
{markku.laine, jari.kleimola, petri.vuorimaa}@aalto.fi


Governments, organizations, and people are publishing open data on the Web more than ever before. To consume the data, however, requires substantial effort from web mashup developers, as they have to familiarize themselves with a diversity of data formats and query techniques specific to each data source. While several solutions have been proposed to improve web querying, none of them covers aforementioned aspects in a developer friendly and efficient manner. Therefore, we devised a unified querying (UniQue) approach and a proxy-based implementation that provides a uniform and declarative interface for querying heterogeneous data sources across the Web. Besides hiding the differences between the underlying data formats and query techniques, UniQue heavily embraces open W3C standards to minimize the learning effort required by developers. Pursuing this further, we propose Unified Query Language (UQL) that combines the expressiveness of CSS Selectors and XPath into a single and flexible selector language. We show that the adoption of UniQue and UQL can effectively streamline web querying, leverage developers’ existing knowledge, and reduce generated network traffic compared to the current state-of-the-art approach.


Mashup Applications, Web Querying, Web Standards, XML Technologies.


Web APIs*

Table 1. Query parameters to the web API calls
Query parameter Description Web API
Contains or points to the data (inline text or absolute URL, respectively) to be converted. P, DC, E, A
format Indicates the format of the data to be converted. Valid values: html, json (default), xml, and csv. P, DC, A
header Indicates the presence or absence of the header line in CSV data. Valid values: present (default) and absent. P, DC, A
separator Specifies the character to separate fields on header and record lines in CSV data. Default value: , (comma). P, DC, A
quote Specifies the character to quote fields on header and record lines in CSV data. Default value: " (double quotation mark). P, DC, A
rfc7159 Indicates whether the JSON parser should be IETF RFC 7159 compliant (instead of IETF RFC 4627). Valid values: yes (default) and no. P, DC, A
rt Indicates whether the resulting XML data should be round-trippable. Valid values: yes (default) and no. P, DC, A
query Contains the query (Selectors, XPath, UniQue, or XQuery) to be executed against the data. Default value: / (forward slash). P, QC, A
ns-* Registers an XML namespace to be used in the query. For example, ns-atom=http://www.w3.org/2005/Atom P, A
wrap Indicates whether the resulting XML data should be wrapped with the <response> element. Valid values: yes (default) and no. P, A
analysis Indicates whether the analysis should be shown instead of the resulting XML data. Valid values: yes and no (default). A
mode Indicates the mode of the Analyzer web API. Valid values: processor (default), dataconverter, queryconverter, and echoer. A

*For testing the web APIs, we recommend the Paw REST client.


Climber Stalker is a real-world mashup web application taking advantage of the UniQue web APIs.
Online demo: http://www.lumisade.fi/climberstalker/


UniQue is released under the terms of the MIT license.


 author={Markku Laine and Jari Kleimola and Petri Vuorimaa},
 title={UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources},
 booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies},


The authors thank William Candillon and Dr. Ghislain Fourny at 28.io for their exceptional support with XQuery.