Schedule for Friday
chaired by yours truly Mohamed Zergaoui (morning) and Jirka Kosek (afternoon)
9:30 | Registration desk opens |
10:00 | Opening and sponsors presentation |
10:20 | Assisted Structured Authoring using Conditional Random Fields Bert Willems (FontoXML) |
10:50 | XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process Steven Higgs (Syncro Soft) |
11:20 | Coffee break |
11:50 | xqerl: XQuery 3.1 Implementation in Erlang Zachary Dean |
12:10 | XML Tree Models for Efficient Copy Operations Michael Kay (Saxonica) |
12:40 | Using Maven with XML Projects Christophe Marchand (Oxiane) and Matthieu Ricaud-Dussarget (Editions Lefebvre Sarrut) |
13:10 | Lunch |
14:40 | Varieties of XML Merge: Concurrent versus Sequential Tejas Barhate (DeltaXML Ltd) and Nigel Whitaker (DeltaXML Ltd) |
15:10 | Including XML Markup in the Automated Collation of Literary Texts Elli Bleeker (Huygens ING), Bram Buitendijk (Huygens ING), Ronald Haentjens Dekker (Huygens ING) and Astrid Kulsdom (Huygens ING) |
15:40 | Diff with XQuery James Fuller (MarkLogic) |
16:10 | Coffee Break |
16:40 | Multi-layered content modelling to the rescue Erik Siegel (Xatapult) |
17:10 | Combining graph and tree: writing SHAX, obtaining SHACL, XSD and more Hans-Juergen Rennau (parsQube GmbH) |
17:40 | Closing of the day |
19:00 | Social dinner |
Session details
Assisted Structured Authoring using Conditional Random Fields
Bert Willems, FontoXML
Authoring structured content with rich semantic markup is repetitive, time consuming and error-prone. Many Subject Matter Experts (SMEs) struggle with the task of applying the correct markup. This paper proposes a mechanism to partially automated this using Conditional Random Fields (CRF), a machine learning algorithm. It also proposes an architecture on how to continuously improve the CRF model in production using a feedback loop.
» Read paper in proceedings
» slides
XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process
Steven Higgs (Syncro Soft)
This paper discusses many of the challenges that Syncro Soft (the makers of the Oxygen XML suite of products) faced when trying to improve the collaboration part of their documentation process, and provides details about their solutions for addressing those challenges. By developing new collaboration products, refining internal processes, and integrating creative collaboration solutions into existing products, they found ways to effectively improve the quality of their documentation, simplify various procedures, and increase the efficiency of their entire collaboration process.
» Read paper in proceedings
» slides
xqerl: XQuery 3.1 Implementation in Erlang
Zachary Dean
xqerl is an Open-source XQuery 3.1 processor written in the Erlang programming language. It compiles XQuery modules into Erlang modules that run in the Erlang virtual machine (BEAM). The goal of the xqerl project is to allow fault-tolerant, concurrent systems to be written completely – or almost completely – using the XQuery language.
This paper will introduce xqerl and some of its current and future features.
» Read paper in proceedings
» slides
XML Tree Models for Efficient Copy Operations
Michael Kay (Saxonica)
A large class of XML transformations involve making fairly small changes to a document. The functional nature of the XSLT and XQuery languages mean that data structures must be immutable, so these operations generally involve physically copying the unchanged parts of the document, which is expensive in time and memory. Although efficient techniques are well known for avoiding these overheads with data structures such as maps (for example, immutable hash tries), these techniques are difficult to apply to the XDM data model because of two closely-related features of that model: it exposes node identity (so a copy of a node is distinguishable from the original), and it allows navigation upwards in the tree (towards the root) as well as downwards. This paper proposes mechanisms to circumvent these difficulties.
» Read paper in proceedings
» slides
Using Maven with XML Projects
Christophe Marchand (Oxiane) and Matthieu Ricaud-Dussarget (Editions Lefebvre Sarrut)
This proposition explains a way to standardize XML projects, using Maven conventions. It shows dependency management between XSL/XQuery libraries and projects that use thems. It shows unit testing, automated builds, continuous integration processes, in a maven way to do.
» Read paper in proceedings
» slides
Varieties of XML Merge: Concurrent versus Sequential
Tejas Barhate (DeltaXML Ltd) and Nigel Whitaker (DeltaXML Ltd)
Merging XML documents is a tricky operation but is required to consolidate or synchronize two or more independent edit paths or versions. As XML tools become more powerful and able to handle many of the peculiarities of real data, so the possibility of achieving a genuine, intelligent merge of XML data sets becomes a reality. The complexity of XML places demands on tools to work intelligently in order to preserve the essential structure of the original document and also represent the changes.
This paper discusses the different varieties of merge for XML. Merging multiple derivatives of a single ancestor (concurrent merge) may be the most obvious application, but there is also a need for sequential merge when a document has been passed around between two or more authors in a sequential manner. Another important, and perhaps less well understood, application is ‘graft’, where the changes between two documents or data sets are applied to a third, different (though similar) document or data set.
There are similarities between these applications, but gaining an understanding of how they differ and where each is appropriate is necessary to make best use of automated processing of XML.
Including XML Markup in the Automated Collation of Literary Texts
Elli Bleeker (Huygens ING), Bram Buitendijk (Huygens ING), Ronald Haentjens Dekker (Huygens ING) and Astrid Kulsdom (Huygens ING)
XML plays a key role in textual and literary research. Scholars use it to express their understanding and interpretation of a text, and to create textual models for further analysis. One of the most widely used tools for textual modelling is the TEI Guidelines, a standard for text encoding developed by the text editing community. One important aspect of textual research is the study of variance, both within and between versions of text. Capturing textual variance in TEI-XML, especially the more complex cases, poses several challenges for computational modelling.In order to obtain a level of complexity during modelling and processing of text, the hypergraph provides a powerful data structure. The present paper concentrates on how a hypergraph handles textual and structural variation. The data structure of a hypergraph makes optimal use of the information-rich XML files that are used in the field of textual scholarship. HyperCollate, the collation tool that uses a hypergraph to model and process textual variation, can therefore deal with different types and complex cases of textual variation.
» Read paper in proceedings
» slides
Diff with XQuery
James Fuller (MarkLogic)
Introduction to an xquery implementation of X-Diff (xdiff.xqy) based on http://pages.cs.wisc.edu/~yuanwang/papers/xdiff.pdf.
Presentation will be broken down as follows: condensed overview of the X-Diff algorithm and how it contrasts with other approaches; xquery implementation; example usage; summary of pros and cons. This presentation will be example driven, tools orientated, focusing mainly on practical application.
Multi-layered content modelling to the rescue
Erik Siegel (Xatapult)
There are companies, for instance educational publishers, that have contents for a multitude of different products. These products share some content aspects but unfortunately also differ wildly in things like structure, metadata and mark-up. From an information modelling perspective, a different model would be required for each of them.
Different models would mean different schemas, authoring tools configurations, publication and conversion streets, etc. A huge investment and hard to maintain.
A way to circumvent this is by introducing multi-layered content modelling:
1) You start off with a base content model that is flexible enough to cater for the needs of all the products. This could be a standard like Docbook or DITA.
2) On top of that you add a second layer of modelling to cater for the product differences. This layer must be able to describe and validate the product specific contents.
This talk will focus on what is needed and how such an extra content modelling layer could look like. Coming from an actual implementation of this idea, examples will be shown and do’s, dont’s and tips will be discussed.
» Read paper in proceedings
» slides
Combining graph and tree: writing SHAX, obtaining SHACL, XSD and more
Hans-Juergen Rennau (parsQube GmbH)
The Shapes Constraint Language (SHACL) is a data modeling language for describing and validating RDF data. This paper introduces SHAX, which is an XML syntax for SHACL. SHAX documents are easy to write and understand. They cannot only be translated into executable SHACL, but also into XSD describing XML data equivalent to the RDF data constrained by the SHACL model. Similarly, SHAX can be translated into JSON Schema describing a JSON representation of the data. SHAX may thus be viewed as an abstract data modeling language, which does not prescribe a concrete representation language (RDF, XML, JSON, …), but can be translated into concrete models validating concrete model instances.