UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources
Abstract
Governments, organizations, and people are publishing open data on the Web more than ever before. To consume the data, however, requires substantial effort from web mashup developers, as they have to familiarize themselves with a diversity of data formats and query techniques specific to each data source. While several solutions have been proposed to improve web querying, none of them covers aforementioned aspects in a developer friendly and efficient manner. Therefore, we devised a unified querying (UniQue) approach and a proxy-based implementation that provides a uniform and declarative interface for querying heterogeneous data sources across the Web. Besides hiding the differences between the underlying data formats and query techniques, UniQue heavily embraces open W3C standards to minimize the learning effort required by developers. Pursuing this further, we propose Unified Query Language (UQL) that combines the expressiveness of CSS Selectors and XPath into a single and flexible selector language. We show that the adoption of UniQue and UQL can effectively streamline web querying, leverage developers’ existing knowledge, and reduce generated network traffic compared to the current state-of-the-art approach.
Keywords
Mashup Applications, Web Querying, Web Standards, XML Technologies.
Paper
Web APIs*
- Processor (P). The main web API that implements the UniQue approach.
Endpoint: http://unique.28.io/processor.xq - Data Converter (DC). A utility web API for converting HTML, JSON, XML, or CSV into XML.
Endpoint: http://unique.28.io/dataconverter.xq - Query Converter (QC). A utility web API for translating Selectors 3.0 or UniQue 1.0 into XPath 1.0 equivalent.
Endpoint: http://unique.28.io/queryconverter.xq - Echoer (E). A utility web API for echoing input data.
Endpoint: http://unique.28.io/echoer.xq - Analyzer (A). A utility web API for the evaluation. Simulates other web APIs and measures processing time for each individual task.
Endpoint: http://unique.28.io/analyzer.xq
Query parameter | Description | Web API |
---|---|---|
data Required |
Contains or points to the data (inline text or absolute URL, respectively) to be converted. | P, DC, E, A |
format | Indicates the format of the data to be converted. Valid values: html, json (default), xml, and csv. | P, DC, A |
header | Indicates the presence or absence of the header line in CSV data. Valid values: present (default) and absent. | P, DC, A |
separator | Specifies the character to separate fields on header and record lines in CSV data. Default value: , (comma). | P, DC, A |
quote | Specifies the character to quote fields on header and record lines in CSV data. Default value: " (double quotation mark). | P, DC, A |
rfc7159 | Indicates whether the JSON parser should be IETF RFC 7159 compliant (instead of IETF RFC 4627). Valid values: yes (default) and no. | P, DC, A |
rt | Indicates whether the resulting XML data should be round-trippable. Valid values: yes (default) and no. | P, DC, A |
query | Contains the query (Selectors, XPath, UniQue, or XQuery) to be executed against the data. Default value: / (forward slash). | P, QC, A |
ns-* | Registers an XML namespace to be used in the query. For example, ns-atom=http://www.w3.org/2005/Atom | P, A |
wrap | Indicates whether the resulting XML data should be wrapped with the <response> element. Valid values: yes (default) and no. | P, A |
analysis | Indicates whether the analysis should be shown instead of the resulting XML data. Valid values: yes and no (default). | A |
mode | Indicates the mode of the Analyzer web API. Valid values: processor (default), dataconverter, queryconverter, and echoer. | A |
Demo
Climber Stalker is a real-world mashup web application taking advantage of the UniQue web APIs.
Online demo: http://www.lumisade.fi/climberstalker/
Downloads
UniQue is released under the terms of the MIT license.
BibTex
@conference{webist16, author={Markku Laine and Jari Kleimola and Petri Vuorimaa}, title={UniQue: An Approach for Unified and Efficient Querying of Heterogeneous Web Data Sources}, booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies}, year={2016}, pages={84-94}, doi={10.5220/0005764100840094}, isbn={978-989-758-186-1}, }
Acknowledgments
The authors thank William Candillon and Dr. Ghislain Fourny at 28.io for their exceptional support with XQuery.