Schedule for Friday
chaired by yours truly Mohamed Zergaoui (morning) and Jirka Kosek (afternoon)
9:30 | Registration desk opens |
10:00 | Opening and sponsors presentation |
10:15 | XML Authoring as a Service George Bina (Syncro Soft) |
10:30 | XPath 3.1 in the Browser John Lumley (jwL Research), Debbie Lockett (Saxonica) and Michael Kay (Saxonica) |
11:10 | Coffee break |
11:40 | Soft validation in an editor environment Martin Middel (FontoXML) |
12:10 | Improving text quality with automatic majority editions Liam Quin (W3C) |
12:40 | Checking documents for DTP with the free online service data2check Mehrschad Zaeri Esfahani (parsQube GmbH), Hauke Brandes (parsQube GmbH) and Manuel Montero Pineda (data2type GmbH) |
13:10 | Lunch |
14:40 | The State of XProc Norman Walsh |
15:10 | W3C ITS 2.0 in OASIS XLIFF 2.1 David Filip (ADAPT Centre) |
15:40 | Coffee Break |
16:10 | Projection and Streaming: Compared, Contrasted, and Synthesized Michael Kay (Saxonica) |
16:40 | The HTML5.1 DTD Marcus Reichardt |
17:10 | Closing of the day |
19:00 | Social dinner and Demo Jam |
Session details
XML Authoring as a Service
George Bina, Syncro Soft
In today’s world, we use more and more services. When we start a project we can do it on GitHub, if we want to communicate within that project we might create a Slack channel, to automate some tasks we can setup Travis to run some scripts, to publish content on the web we can use GitHub Pages, and so on. In this session, we will explore how an XML Authoring service can fit into this services world and various possible use cases.
XPath 3.1 in the Browser
John Lumley (jwL Research), Debbie Lockett (Saxonica) and Michael Kay (Saxonica)
This paper discusses the implementation of an XPath3.1 processor with high levels of standards compliance that runs entirely within current modern browsers. The runtime engine Saxon-JS, written in JavaScript and developed by Saxonica, used to run pre-compiled XSLT3.0 stylesheets, is extended with a dynamic XPath parser and converter to the Saxon-JS compilation format. This is used to support both XSLT’s xsl:evaluate instruction and a JavaScript API XPath.evaluate() which supports XPath outside an XSLT context.
Soft validation in an editor environment
Martin Middel (FontoXML)
To allow the use of Schematron in a quickly changing environment like an editor, understandability is crucial. An author may not be an XML expert, so they must be guided in resolving the messages generated by Schematron.
A good understandability rests on two pillars: performance and user interface. An author needs constant feedback on the current state of the soft validation. The author must know which places in the document need attention, and how to resolve them. The report must then update as soon as possible to enable the author to see the result of a modification they just made.
To ensure good performance, a number of technical problems have been solved, this includes a novel dependency tracking system.
Improving text quality with automatic majority editions
Liam Quin (W3C)
This paper describes a method to improve accuracy of OCR-scanned text documents prior to conversion to XML. The paper also describes approaches to conversion of such documents to XML and consideration of the point in the conversion process at which it makes the most sense to start using XML tools.
Checking documents for DTP with the free online service data2check
Mehrschad Zaeri Esfahani (parsQube GmbH), Hauke Brandes (parsQube GmbH) and Manuel Montero Pineda (data2type GmbH)
With the help of the free service “data2check”, Word, InDesign and EPUB files can be checked for correctness.
Word documents are checked for compliance with the styles used (paragraph and character styles). In this way, important copy-editing guidelines of a publishing house or a company can be controlled. Furthermore, a document can be examined for Word-specific components making further processing much more difficult (e.g. text boxes, pictures, charts, etc.). InDesign documents can be checked for created paragraph and character styles but also for constructs making the export from InDesign to the EPUB format impossible.
After completion of a check, any errors found can be tracked with the help of comments in the output document. In Addition, the uploaded data is stored as XML files in a database, where they are available at any time.
The State of XProc
Norman Walsh
Although the working group has been closed by the W3C, the use cases for XProc remain as prevalent as ever. A community effort to revise the XProc specification and extend the scope of XProc exists and will have just finished a two day workshop when XML Prague begins. We’ll give a brief status report and outline short- and long-term plans.
W3C ITS 2.0 in OASIS XLIFF 2.1
David Filip (ADAPT Centre)
XLIFF is the XML Localization Interchange File Format. The current OASIS Standard version is XLIFF Version 2.0 XLIFF 2.0. XLIFF Version 2.1 XLIFF 2.1 [csprd01] concluded the 1st public review period on 25th November 2016. The major new features added to XLIFF 2.1 compared to XLIFF 2.0 are the native ITS 2.0 support and the Advanced Validation feature via NVDL and Schematron. The Advanced Validation feature for XLIFF 2 was first introduced at FEISGILTT 2014 and covered extensively at XML London 2016. In this paper and presentation, we want to look in detail at the ITS 2.0 native support feature.
In this paper and XML Prague presentation we will explain in detail about W3C ITS 2.0 metadata categories support in XLIFF 2.1 and which ITS data in XLIFF 2.1 are accessible or not to generic ITS Processors despite the use of the W3C namespace.
Projection and Streaming: Compared, Contrasted, and Synthesized
Michael Kay (Saxonica)
This paper describes, compares, and contrasts two techniques designed to enable an XML document to be processed without building an entire tree representation of the document in memory. Document projection analyses a query to determine which parts of the document are relevant to the query, and discards everything else during source document parsing. Streaming attempts to execute a stylesheet “on the fly” while the source document is being read.
For both techniques, the paper describes the way that they are implemented in the Saxon XSLT and XQuery engine.
Performance results are given that apply to both techniques, in relation to the queries in the XMark benchmark applied to a 118Mb source document.
The paper concludes with a discussion of ideas for combining the benefits of both techniques and getting more synergy between them.
The HTML5.1 DTD
Marcus Reichardt
Based on W3C’s HTML5.1 specification, a new SGML DTD for HTML5.1 with broad applicability for validation and normalization is developed.
A formal grammar for HTML is as useful as ever in any HTML-centric workflow, such as content authoring, preservation of web content and digital heritage, establishing a baseline for defining contractual obligations of web authors, agencies, and other content providers delivering web content, archival of legal documents in web-mediated business or administrative transactions.
The DTD presented here is biased towards security and forward-engineering tasks in that it dispenses with legacy practices such as uppercase HTML markup support and unsafe handling of script data. Due to SGML’s flexibility, with only very minor changes to the DTDs and accompanying SGML declaration, the DTD can also be employed for parsing legacy web content for applications such as crawlers, extracting web content for further processing, web scraping, etc.
Moreover, DTD customization can also be used for emerging web markup practices such as HTML custom elements, and Google’s AMP profile of HTML. A formal and well understood, rather than ad-hoc process of making web content available to robust XML back-end processing pipelines is also valuable for the XML community.
While SGML has largely been treated as a legacy technique by the web community for years now, this work also shows that not only is SGML capable of describing and parsing HTML5 precisely and elegantly, it’s the only game in town being able to tackle formal processing of web content based on an international standard.