Friday, February 13th
Preconference schedule on a separate page.
Saturday, February 14th
chaired by yours truly Mohamed Zergaoui (morning) and Jirka Kosek (afternoon)
9:00 | Registration desk opens |
9:30 | Opening and sponsors presentation |
9:55 | Parallel Processing in the Saxon XSLT Processor Michael Kay (Saxonica) |
10:40 | Parallel XSLT Processing of Large Documents Jakub Malý (Barclays) |
11:10 | Coffee break |
11:40 | Semantic Hybridization: Mixing RDFa and JSON-LD R. Alexander Miłowski (School of Information, University of California, Berkeley) |
12:10 | Using DocBook to Produce a Polyvalent Academic Work Murray Maloney (Muzmo), Robert J. Glushko (UC Berkeley), R. Alexander Milowski (UC Berkeley) |
12:40 | Generation of a “semantic” eBook: all you need is XML Vincent Gros (Hachette Livre), Jean-Claude Moissinac (Télécom ParisTech) and Luc Audrain (Hachette Livre) |
13:10 | Lunch |
14:40 | Standards update: XInclude 1.1 Norman Walsh (Marklogic) |
15:00 | Standards update: XProc 2.0 Norman Walsh (Marklogic) |
15:20 | The convergence between EPUB and the Open Web Platform Felix Sasaki (DFKI/W3C) |
15:40 | Coffee Break |
16:10 | Publishing with CSS Paged Media – A review of existing tools and techniques Andreas Jung (ZOPYX) |
16:40 | Future of publishing – panel discussion Andreas Jung, Patrick Gundlach, Michael Miller, Romain Deltour, Vincent Gros and Jirka Kosek |
17:40 | Closing of the day |
19:30 | Social dinner & Demo Jam |
Sunday, February 15th
chaired by yours truly James Fuller (morning) and Mohamed Zergaoui (afternoon)
9:30 | Registration desk opens |
10:00 | Opening of the second day |
10:10 | Building Security Analytics solution using Native XML Database Mansi Sheth (Veracode Inc) |
10:40 | Node search preceding node construction – XQuery inviting non-XML technologies Hans-Juergen Rennau (Traveltainmen GmbH) |
11:10 | Coffee break |
11:40 | Native XML Databases: Death or Coming of Age Xavier Franc (Qualcomm Technologies Inc) and Michael Paddon (Qualcomm Technologies Inc) |
12:10 | A Unified Approach to Design and Implement data-centric and document-centric XML Web Applications Christine Vanoirbeek (Epfl), Stéphane Sire (oppidoc) and Houda Chabbi (hefr) |
12:40 | Graphical User Interface Tool for Designing Model-Based User Interfaces with UIML Anne Brüggemann-Klein, Lyuben Dimitrov and Marouane Sayih (Technische Universität München) |
13:10 | Lunch |
14:40 | Survey State Model (SSM) – XML Authoring of electronic questionnaires Jose Lloret (Robert Gordon University) and Nirmalie Wiratunga (Robert Gordon University) |
15:10 | Schematron for Information Architects George Bina (Syncro Soft/oXygen XML Editor) |
15:40 | TXSTEP – an integrated XML-based scripting language for scholarly text data processing Wilhelm Ott (University of Tuebingen) and Tobias Ott (pagina publication technologies ltd.) |
16:10 | Coffee Break |
16:40 | In-Browser XML Document Streaming Emmanouil Potetsianakis (Telecom ParisTech) and Cyril Concolato (Telecom ParisTech) |
17:10 | Standards update: XPath/XQuery/XSLT/XML Schema Michael Kay (Saxonica) |
17:40 | Closing of the conference |
Session details
Semantic Hybridization: Mixing RDFa and JSON-LD
Alex Milowski (UC Berkeley)
Parallel Processing in the Saxon XSLT Processor
Michael Kay (Saxonica)
One of the supposed benefits of using declarative languages (like XSLT) is the potential for parallel execution, taking advantage of the multi-core processors that are now available in commodity hardware.
This paper describes recent developments in one popular XSLT processor, Saxon, which start to exploit this potential. It outlines the challenges in implementing parallel execution, and reports on the benefits that have been observed.
Parallel XSLT Processing of Large Documents
Jakub Malý (Barclays)
After the introduction of streaming in XSLT 3.0, new possibilities and applications for XSLT opened up. Streaming stylesheets can process documents with bounded memory consumption, even large documents that would not fit into memory when a non-streaming processor is used. With bounded memory consumption and disc space de-facto unlimited (and SSD drives providing fast access to stored data), CPU speed can become a bottleneck in many scenarios. However, contemporary commodity machines have 2, 4 or more CPU cores, of which only one is usually used by present day XSLT processors (and the CPU is therefore underutilized). In this paper, we discuss scenarios in which optimal CPU utilization can be achieved by processing the input file in a parallel manner. As the experiments show, such an approach can significantly increase the performance (=shorten run time) of a transformation by up to ~35% in our experimental setting.
Semantic Hybridization: Mixing RDFa and JSON-LD
R. Alexander Miłowski (School of Information, University of California, Berkeley)
JSON-LD and RDFa are being promoted for use on the Web to augment and annotate information. Yet, each format has its optimal use for encoding particular kinds of information. This paper describes a hybrid approach where JSON-LD and RDFa can be used together to provide optimal encoding while retaining connections to document locations.
Using DocBook to Produce a Polyvalent Academic Work
Murray Maloney (Muzmo), Robert J. Glushko (UC Berkeley), R. Alexander Milowski (UC Berkeley)
Creating many different versions or customizations by configuring a collection of components is a desirable goal in many domains. A single automobile production line can support the assembly of customized variations of a car model. Software product line engineering enables the creation of many similar software systems from a shared set of software assets. In this article we discuss how a collection of content elements can create a family of related texts whose different members are generated according to configurations of variables found in the content markup. This markup is created by the author, but anyone can create a particular edition of the text by defining a configuration file at book-building time, and a reader can do this interactively at reading-time by making selections from a configuration control widget. We call this configurable collection of content elements a “polyvalent” document: “Poly” means “more than one” or “many” – “valent” means “having combining power.”
There are some common challenges in all of these domains. The first is to distinguish the components that are contained in every manifestation, typically called the core, base, or platform, from those that vary, typically called the features, options, or supplements. The second is to organize the variable components to indicate the different customizations, versions, or editions that can be built by selectively combining optional components with the required ones. The third challenge is to convey to the builders, users, or others who want to use the variable components any dependencies or constraints that might exist, since not every possible combination will be feasible or sensible.
In this paper, we examine the facility with which DocBook was coerced into supporting a polyvalent text, and the challenges encountered. We observe the parallels and disjunctions among the vocabularies used in book production, the suitability of XHTML and CSS as content delivery agents, the varying capabilities of current ePub3 readers, and the suitability of relying upon CSS and JavaScript in an ePub context.
Generation of a “semantic” eBook: all you need is XML
Vincent Gros (Hachette Livre), Jean-Claude Moissinac (Télécom ParisTech) and Luc Audrain (Hachette Livre)
EPUB format is the widely adopted standard compatible with modern reading devices and tablet applications In June 2014, the minor release of EPUB, EPUB 3.0.1, allows microdata and RDFa for the semantic enrichment, in addition to the existing semantic inflection provided by the epub:type attribute. But producing semantic eBooks “by hand” can be time-consuming and expensive for a publisher. Therefore we propose to think about an automated generation based on a XML workflow, extended by inspiration of Schematron processing with XSLT transformations driven by pipeline in XProc. We experiment this approach to produce a French wine guide in EPUB format.
Standards update: XInclude 1.1
Norman Walsh (Marklogic)
New features in XInclude 1.1 are designed to make it easier to deal with general problems of XML transclusion (ID and IDREF fixup, for example) and to improve support for including non-XML content.
Unfortunately, implementation experience convinced the Working Group that the solution proposed in the Candidate Recommendation draft was poorly conceived. This session will summarize the new approach taken in the recently published Last Call draft.
Standards update: XProc 2.0
Norman Walsh (Marklogic)
Practical experience with XProc 1.0 revealed a number of problematic areas in the language. In XProc 2.0, the Working Group is attempting to address the most conspicuous usability problems, including:
- Better support for non-XML documents in the pipeline
- Radical simplification of parameters
- Support for attribute and text value templates
- Support for arbitrary XDM values in variables and options
- Better inter-step communication (document metadata)
- A suite of syntactic shortcuts for common idioms
The First Public Working Draft, published in December begins to show the direction these improvements will take.
The convergence between EPUB and the Open Web Platform
Felix Sasaki (DFKI/W3C)
Most ebooks can be thought of as miniature Web sites encapsulated as a unit. If this is so, why can’t you open an ebook in a Web browser? Why can’t any small Web site function as an electronic book in an ebook reader?
The technical issues are relatively small, although not insignificant. From a business and political perspective, EPUB and Web development has happened in separate organizations – IDPF and W3C respectively. But IDPF and W3C have been moving closer. The Digital Publishing Interest Group at W3C works in areas important to IDPF Members; IDPF has involvement in metadata and accessibility important at W3C. Now, IDPF and W3C have agreed to work together to produce a convergence, provisionally referred to as EPUB-WEB.
The talk gives an update on the status and explores what the work might mean for electronic book developers, for Web site publishers, for producing print, and for the end user.
Publishing with CSS Paged Media – A review of existing tools and techniques
Andreas Jung (ZOPYX)
XSL-FO is the de-facto technical standard for automatic typesetting of XML content in the publishing industry besides workflows build on top of Adobe Indesign. XSL-FO has come into its years and is not very much loved by the XML solution providers. Professional XSL-FO engines like Antenna House provide very good typographic results but it is usually a hard and time-consuming business for producing high-quality documents with XSL-FO.
The author worked in 2007 with XSL-FO for one year as part of a publishing project at Haufe-Lexware where HTML content provided through a content-retrival application (installed as desktop application on customer PCs) had to be converted to PDF, RTF and DOC on the fly as part of an extended print/export functionality. The only possible solution for building a cross-plattform PDF generation that would both run on customer desktops and on server installations was based on the converter XINC (an Apache FOP derivative). Already at this time it was possible to use CSS with some extensions to create PDF documents with a professional look & feel and support for headers and footers, footnotes, reasonable table layouts etc. However the underlaying XSL-FO functionality was never exposed to the developers since an open-source converter csstoxslfo could be used to transform the existing HTML content together with a stylesheet to XSL-FO and run the conversion on top of the generation FO document.
Building Security Analytics solution using Native XML Database
Mansi Sheth (Veracode Inc)
The trove of ever-expanding metadata we are collecting on a daily basis, poses us with the challenge of mining information out of this data-store, to help drive our business analytics solutions. Most non-destructive format of these metadata is in XML formats, so it became crucial to use a technology, which provides sophisticated support for XML specific query technologies. This paper will discuss how Veracode is using Native XML Databases(NxD) tool BaseX, to solve various use cases across multiple departments. It will discuss in depth, how its incorporated, its architecture and eco-system. It will also touch base on lessons learned along the way, including approaches which were tried and didn’t work so well.
Node search preceding node construction – XQuery inviting non-XML technologies
Hans-Juergen Rennau (Traveltainmen GmbH)
We propose an approach how to complement XPath navigation with a node search which does not require node construction. Node search is based on a set of external properties (a “p-face”) which a node may assume in the context of a node collection. Being external, these properties can be retrieved without node construction, and being stored outside the nodes they can be maintained and queried by non-XML technologies, e.g. relational and NOSQL databases. A small set of concepts, carefully aligned with the XQuery data model, allows the seamless integration of various non-XML technologies driving node selection, without introducing any dependencies of XQuery code on any particular technology. A first implementation of the concepts is presented.
Native XML Databases: Death or Coming of Age
Xavier Franc (Qualcomm Technologies Inc) and Michael Paddon (Qualcomm Technologies Inc)
Back in the early 2000s Native XML Databases (NXDbs) promised a bright future. Today, adoption falls short of those promises and some have even suggested that the technology is in decline. We analyze this “trough of disillusionment” with reference to the Gartner Hype Cycle and compare it to the history of relational databases.
Despite the slow maturation of NXDbs, XML itself is now well established and here to stay. The need for repository systems able to store and query large document collections is likely to increase as ever more XML data is generated. XML is an important technology in the field of publishing and digital preservation, and is especially effective when it comes to handling a mix of text and data.
Native XML Databases might represent only a niche market, but may prove to be irreplaceable in some applications.
To illustrate this, we present a use case in which an NXDb turned out to be a good solution for a massive data warehouse problem. Why other solutions fell short is discussed, as well as the specific features that made the product we selected, the Qualcomm Qizx database, suitable. To conclude, we provide our thoughts about which features are desirable or even possibly indispensable in an NXDb from the perspective of making native XML Databases more attractive to application designers.
A Unified Approach to Design and Implement data-centric and document-centric XML Web Applications
Christine Vanoirbeek (Epfl), Stéphane Sire (oppidoc) and Houda Chabbi (hefr)
The paper addresses the development of XML-based web applications that indifferently deal with structured document-centric or data-centric information. It focuses on a fundamental aspect – the validation of information – that provides substantial benefits at two levels: (i) the enhancement for the developers to design and implement error-free applications and (ii) the capacity for the end users to provide information that meets the requirements of the underlying application information model. The paper proposes an approach based on RELAX NG and Schematron. It also describes an implementation of the validation process in the eXist-db database environment using the Oppidum XQuery development framework.
Graphical User Interface Tool for Designing Model-Based User Interfaces with UIML
Anne Brüggemann-Klein, Lyuben Dimitrov and Marouane Sayih (Technische Universität München)
Graphical user interfaces can be designed using various types of editors. Almost all types are based on one of the following two principles: What You See is What You Mean (WYSIWYM) or What You See is What You Get (WYSIWYG). The WYSIWYG tools are the most sophisticated ones and are usually constrained to concrete platforms or formats. On the other hand, WYSIWYM editors come with a presentation neutral way of capturing the semantics of the content, rather than the exact presentation. This principle provides a significant advantage when authoring XML-based user interface documents in terms of device and platform independence. The WYSIWYM paradigm alleviates the stress of exactly displaying the information that is being conveyed, and thus does not require maintaining precise formatting between the Editing View and the Final Result.
In this paper we present a graphical user interface editor based on the WYSIWYM principle for designing model-based user interfaces with the User Interface Markup Language (UIML) as a modeling language, XSLT for transformation and XForms for presentation. We provide a proof of concept and show the applicability of using XML technologies for designing graphical user interfaces for End-user Development.
Survey State Model (SSM) – XML Authoring of electronic questionnaires
Jose Lloret (Robert Gordon University) and Nirmalie Wiratunga (Robert Gordon University)
CAI systems use questionnaires as the instruments to conduct survey research. XML constitutes a formal way to represent the features of questionnaires which include content coverage, personalisation aspects and importantly routing functionalities. In this paper we conduct a comparative analysis on different XML approaches to questionnaire modelling. Our findings suggest that existing language formalism are more likely to cover content coverage but often fail to model routing aspects.
In particular the popular hierarchical approach to modelling routing functionality has one or more draw backs along the lines of ability to facilitate questionnaire logic validation, ease of understanding by domain experts and flexibility to enable refinements to questionnaires.
Accordingly we introduce the SSM XML language based on a state-transition model to address these shortcomings. We present our results from testing SSM on a sample of real-world surveys from Pexel Research Services in the UK. We use the distribution of SSM’s vocabulary on this sample to demonstrate SSM’s applicability and its coverage of questionnaire constructs and effective routing support.
Schematron for Information Architects
George Bina (Syncro Soft/oXygen XML Editor)
Schematron is a different kind of XML schema language, it focuses not on the grammar of the document but on different rules the structure and the content should follow. It is used successfully in the industry to enforce business rules on XML documents. Although it contains only 21 elements and a few attributes many people that do not have a technical background will be intimidated by the thought of learning Schematron.
“Information architect” is an emerging profession in the XML information domain that defines a role responsible for the overall structure of the information models. Such a person should try to achieve consistent writing styles, structures, and reuse decisions and communicate and enforce the information model to information developers. While information architects will be able understand the business needs they will not necessarily be experts on XSLT, XPath, Schematron and other XML technologies.
As one of the missions of an information architect is to enforce an information model and the use of a consistent structure and style, we can immediately infer that Schematron will be a great tool to master, but we cannot expect these people to become experts in Schematron. During many years of experience with Schematron I discovered that if we follow a set of best practice rules we can make Schematron accessible to anyone, thus enabling information architects to express business rules that will govern the XML information created by XML authors.
We can structure Schematron rules to enable people that are not Schematron experts to create business rules in an easy way and we can take this idea further and build the business rules as part of a style guide, thus single sourcing the prose and the rules to automatically enforce the prose of the style guide. These ideas are materialized as an open source project on GitHub.
slides
You have to click on the play button next to a topic title to switch to slides mode and then navigation actions will be at the top of the page.
TXSTEP – an integrated XML-based scripting language for scholarly text data processing
Wilhelm Ott (University of Tuebingen) and Tobias Ott (pagina publication technologies ltd.)
With TXSTEP, we present and put up to discussion a preliminary version of a new, powerful XML-based tool for scholarly research in the text-based humanities. Its architecture is based on more than 40 years of experience in supporting humanities projects at the University of Tübingen and beyond.
The purpose of TXSTEP is not to provide another toolbox containing ready-made solutions for pre-defined problems. Of course, tools like these are adequate for many purposes; but we see no urgency to add a further one to the existing packages of this kind.
In fact, TXSTEP has been designed as a high performing scripting environment for the serious humanities scholar and other professionals in text data processing who face problems not easily solvable by XSLT or other means. TXSTEP gives them complete control over every detail of the data processing part of their projects.
TXSTEPs architecture is based on the Tübingen System of Text Processing tools, TUSTEP, whose current version is the result of more than 40 years of experience in supporting humanities projects at the University of Tübingen and beyond.
The proposed paper will include a short demo of TXSTEPs functionality, showing solutions also for tasks which can not easily be performed by existing XML tools.
In-Browser XML Document Streaming
Emmanouil Potetsianakis (Telecom ParisTech) and Cyril Concolato (Telecom ParisTech)
Through the past few years, in-browser streaming of audiovisual content has become a commodity. Due to the diverse nature of possible audiovisual applications, there is often a need for accompanying descriptors and other metadata, such as semantic annotations, captions, etc. The metadata need to be sent in a timely fashion along with the multimedia content. The use of XML in such cases is common, and the usual approach for in-browser transmission of such data is via AJAX. Even though AJAX can be sufficient for many services, there is consideration for few offline or live scenarios. With MP4Box and MP4Box.js we are able to synchronously stream and consume XML and multimedia data, packaged in MP4 containers with a standard browser. Accompanying XML documents can be transmitted as a whole, or progressively (in fragments). In this paper, we define the use-cases for this technology, analyze the requirements and present the mechanisms of MP4Box and MP4Box.js for XML end-to-end transmission inside the browser.
Standards update: XPath/XQuery/XSLT
Michael Kay (Saxonica)
This talk will provide an update on what’s happening on the standards scene for the key XML technologies, in particular XML Schema, XSLT, and XQuery. This won’t be a long catalog of new features; more an analysis of the state of play. What have been the key developments over the last few years, and what impact have they made on real life? What work is currently in progress, and what is likely to happen next? What requirements are being addressed, and what are the remaining obstacles to interoperability?