UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources

Markku Laine, Jari Kleimola, Petri Vuorimaa
Department of Computer Science, Aalto University, Finland
{markku.laine, jari.kleimola, petri.vuorimaa}@aalto.fi

Abstract

Governments, organizations, and people are publishing open data on the Web more than ever before. To consume the data, however, requires substantial effort from web mashup developers, as they have to familiarize themselves with a diversity of data formats and query techniques specific to each data source. While several solutions have been proposed to improve web querying, none of them covers aforementioned aspects in a developer friendly and efficient manner. Therefore, we devised a unified querying (UniQue) approach and a proxy-based implementation that provides a uniform and declarative interface for querying heterogeneous data sources across the Web. Besides hiding the differences between the underlying data formats and query techniques, UniQue heavily embraces open W3C standards to minimize the learning effort required by developers. Pursuing this further, we propose Unified Query Language (UQL) that combines the expressiveness of CSS Selectors and XPath into a single and flexible selector language. We show that the adoption of UniQue and UQL can effectively streamline web querying, leverage developers’ existing knowledge, and reduce generated network traffic compared to the current state-of-the-art approach.

Keywords

Mashup Applications, Web Querying, Web Standards, XML Technologies.

Paper

Web APIs*

Table 1. Query parameters to the web API calls
Query parameter Description Web API
data
Required
Contains or points to the data (inline text or absolute URL, respectively) to be converted. P, DC, E, A
format Indicates the format of the data to be converted. Valid values: html, json (default), xml, and csv. P, DC, A
header Indicates the presence or absence of the header line in CSV data. Valid values: present (default) and absent. P, DC, A
separator Specifies the character to separate fields on header and record lines in CSV data. Default value: , (comma). P, DC, A
quote Specifies the character to quote fields on header and record lines in CSV data. Default value: " (double quotation mark). P, DC, A
rfc7159 Indicates whether the JSON parser should be IETF RFC 7159 compliant (instead of IETF RFC 4627). Valid values: yes (default) and no. P, DC, A
rt Indicates whether the resulting XML data should be round-trippable. Valid values: yes (default) and no. P, DC, A
query Contains the query (Selectors, XPath, UniQue, or XQuery) to be executed against the data. Default value: / (forward slash). P, QC, A
ns-* Registers an XML namespace to be used in the query. For example, ns-atom=http://www.w3.org/2005/Atom P, A
wrap Indicates whether the resulting XML data should be wrapped with the <response> element. Valid values: yes (default) and no. P, A
analysis Indicates whether the analysis should be shown instead of the resulting XML data. Valid values: yes and no (default). A
mode Indicates the mode of the Analyzer web API. Valid values: processor (default), dataconverter, queryconverter, and echoer. A

*For testing the web APIs, we recommend the Paw REST client.

Demo

Climber Stalker is a real-world mashup web application taking advantage of the UniQue web APIs.
Online demo: http://www.lumisade.fi/climberstalker/

Downloads

UniQue is released under the terms of the MIT license.

BibTex

@conference{webist16,
 author={Markku Laine and Jari Kleimola and Petri Vuorimaa},
 title={UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources},
 booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies},
 year={2016},
 pages={84-94},
 doi={10.5220/0005764100840094},
 isbn={978-989-758-186-1},
}

Acknowledgments

The authors thank William Candillon and Dr. Ghislain Fourny at 28.io for their exceptional support with XQuery.