Schedule for Saturday
chaired by yours truly Jirka Kosek (morning) and Mohamed Zergaoui (afternoon)
9:00 | Registration desk opens |
9:30 | Opening of the last conference day and sponsor presentations |
10:00 | Excellent XProc 3.0 Erik Siegel, Norman Walsh, Achim Berndzen and Gerrit Imsieke (XProc Next Community Group) |
10:30 | XProc in XSLT – Why and Why Not Liam Quin (Barefoot Computing) |
11:00 | Coffee break |
11:30 | Merging The Swedish Code of Statutes (SFS) Ari Nordström (Karnov Group) |
12:00 | JLIFF, Creating a JSON Serialization of OASIS XLIFF 2 David Filip (Trinity College Dublin), Phil Ritchie (Vistattec) and Robert van Engelen (Genivia) |
12:30 | History and the Future of Markup Michael Piotrowski (University of Lausanne) |
12:50 | Lunch |
14:20 | Splitting XML Documents at Milestone Elements Using the XSLT Upward Projection Method Gerrit Imsieke (le-tex publishing services) |
14:50 | Sonar XSL Jim Etevenard (OXiane) |
15:20 | Copy-fitting for Fun and Profit Tony Graham (Antenna House) |
15:50 | Coffee Break |
16:20 | RDFe – expression-based mapping of XML documents to RDF triples Hans-Juergen Rennau (parsQube) |
16:50 | Trialling a new JATS-XML workflow for scientific publishing Tamir Hassan (Round-Trip PDF Solutions) |
17:20 | On the Specification of Invisible XML Steven Pemberton (CWI) |
17:50 | Closing of the conference |
Session details
Excellent XProc 3.0
Erik Siegel, Norman Walsh, Achim Berndzen and Gerrit Imsieke (XProc Next Community Group)
Since 2017, the W3C XProc Next Community Group has been busy specifying a new version of the XProc standard, version 3.0. Overall goals are improving usability and alignment with advancements in other standards (XPath 3.1 as the most important one). By XML Prague 2019 we’re planning to have a core specification ready for “last call”. This talk will present an overview of the XProc language and its principles and link this to the changes made in V3.0.
» slides
XProc in XSLT – Why and Why Not
Liam Quin (Barefoot Computing)
Version 3 of XSLT includes the ability to call a secondary XSLT transformation from an XPath expression, using fn:transform(). It’s also possible in most XSLT implementations to call out to system functions, so that one can arrange to run an arbitrary external program from within a stylesheet. The abilities from expath to read and write files, process zip archives, create and remove files and so forth also exist.
This integration means that it’s often easier to call fn:transform() from a wrapper stylesheet than to use an XProc pipeline. Doing this, however, brings both advantages and disadvantages.
This paper expands on the technique in more detail, illustrates some things that cannot be achieved, together with techniques to mitigate the limitations. The conclusion incorporates suggestions on when it is better to move to XProc or other tools.
» Read paper in proceedings
» slides
Merging The Swedish Code of Statutes (SFS)
Ari Nordström (Karnov Group)
In 2017, Karnov Group, a Danish legal publisher, bought Nordstedts juridik, a Swedish legal publisher, and set about to merge their document sources. Both companies publish SFS, the Swedish Code of Statutes, online and in print, and both from in-house XML sources based on PDF sources provided by Regeringskansliet, i.e. the Swedish government.
But the companies use separate tag sets and their own interpretations of the semantics in the PDFs. They also enrich the basic law text with extensive annotations, links to (and from) caselaw, and so on. Nordstedts also publishes the so-called blue book, a yearly printed book of the entire in-force SFS.
It doesn’t make sense to continue maintaining the two SFS sources separately, of course. Instead, we want, essentially, the sum of the two, with annotations, missing versions (significant gaps exist in the version histories), etc added.
This paper is about merging the SFS content into a single XML source. Basically, we convert both sources into a single exchange format, compare those using Delta XML’s XML Compare, and then do the actual merge based on the diff. Finally, we convert the merged content to a future editing format.
JLIFF, Creating a JSON Serialization of OASIS XLIFF 2
David Filip (Trinity College Dublin), Phil Ritchie (Vistattec) and Robert van Engelen (Genivia)
We will inform on developments parallel to OASIS XLIFF 2.0 and XLIFF 2.1. We are going to cover the challenges and victories when porting a complex, multimodal, modular, and extensible XML business vocabulary – XLIFF 2 – into JSON, via an abstract object model that captures the business logic of a best practice based localization interchange in a way independent of its original XML serialization.
How would you deal with namespace based modules and extensions in JSON that doesn’t have namespace support? How do you model spans, how do you exchange xsd data types?
Using JLIFF JSON schema and JLIFF examples, we will demonstrate how JLIFF is capable of lossless exchange with XLIFF based pipelines.
JLIFF is in flux, we consider the JLIFF JSON schema near technical stability, and we have only just started creating the prose specification for JLIFF. Yet, we are sure as of summer 2018 that we a have a feature complete JLIFF that can be used for real time automation and exchange defined XLIFF data model fragments among conformant agents without breaking data integrity. More importantly, JLIFF attracts early adopters in the industry and real time translation exchange API implementations.
» Read paper in proceedings
» slides
History and the Future of Markup
Michael Piotrowski (University of Lausanne)
It is clear that the heydays of XML are over, and XML is now confronted with competitors that represent—in some respects—steps backwards. In this paper we argue that at this point it is of particular importance to study the history of markup in order to avoid reinventing the wheel, but also to reflect on and possibly re-evaluate design choices in order to advance the state of the art.
» Read paper in proceedings
» slides
Splitting XML Documents at Milestone Elements Using the XSLT Upward Projection Method
Gerrit Imsieke (le-tex publishing services)
Creating chunks out of a larger XML tree is easy if the splitting points correspond to natural structural units, such as chapters of a book. When the splitting points are buried at varying levels in deeply nested markup, however, the task becomes more difficult.
Examples for this problem include: Splitting mostly flat HTML at headings (that are sometimes wrapped in divs); splitting paragraphs at line breaks (even when some highlighting markup stretches across the line breaks); transforming tabulated lines (that also may contain markup around the tabs) into tables proper; splitting chapters and sections at page breaks, etc.
The presented solution does this quite elegantly using XSLT 2.0 grouping and tunneled parameters.
It will be discussed how the performance of this method scales with input size and/or the number of split points, whether xsl:evaluate can be use to create a generic, configurable method and whether the method works with streaming XSLT processing.
An interactive visualization of how this method works will be included in the presentation.
» Read paper in proceedings
» slides
Sonar XSL
Jim Etevenard (OXiane)
A Schematron-based SonarQube plugin for XSL code quality measurement.
» Read paper in proceedings
» slides
Copy-fitting for Fun and Profit
Tony Graham (Antenna House)
Copy-fitting is the fitting of words into the space available for them or, sometimes, adjusting the space available to fit the words. Copy-fitting is included in “Extensible Stylesheet Language (XSL) Requirements Version 2.0” and is a common feature of making real-world documents. This talk describes an ongoing internal project for automatically finding and fixing problems that can be fixed by copy fitting in XSL-FO.
RDFe – expression-based mapping of XML documents to RDF triples
Hans-Juergen Rennau (parsQube)
RDFe is an XML language for mapping XML documents to RDF triples. The name suffix “e” stands for expression and hints at the key concept, which is the use of XPath expressions mapping semantic relationships between subjects and objects to structural relationships between XML nodes. More precisely, RDF properties are represented by XPath expressions evaluated in the context of an XML node which represents the triple subject and yielding XDM value items which represent the triple object. The expressiveness of XPath version 3.1 enables the semantic interpretation of XML resources of any structure and content. Required XPath expressions can be simplified by the definition of a dynamic context whose variables and functions are referenced by the expressions. Semantic relationships can be across document boundaries, and new XML document URIs can be discovered in the content of input documents, so that RDFe is capable of gleaning linked data. As XPath extension functions may support the parsing of non-XML resources (JSON, CSV, HTML), RDFe can also be used for mapping mixtures of XML and non-XML resources to RDF graphs.
» Read paper in proceedings
» slides
Trialling a new JATS-XML workflow for scientific publishing
Tamir Hassan (Round-Trip PDF Solutions)
For better or for worse, PDF is the standard for the exchange of scholarly articles. Over the past few years, there have been a number of efforts to try to move towards better structured workflows, typically based on XML, but they have not gained widespread traction in the academic community for a number of reasons. This paper describes our experience in trialling a new “hybrid” PDF/XML workflow for the proceedings of the workshops held at the ACM Symposium on Document Engineering 2018.
» Read paper in proceedings
» slides
On the Specification of Invisible XML
Steven Pemberton (CWI)
Invisible XML (ixml) is a method for treating non-XML documents as if they were XML.
This paper describes the production of the specification of ixml. An interesting aspect of this specification is that ixml is itself an application of ixml: the grammar describes itself, and therefore can be used to parse itself, and thus produce an XML representation of the grammar. We discuss the decisions taken to produce the most useful XML version possible.