Schedule for Friday
chaired by yours truly Mohamed Zergaoui (morning) and Jirka Kosek (afternoon)
9:00 | Registration desk opens |
9:30 | Opening and sponsors presentation |
9:40 | Writing more robust XSLT stylesheets by understanding and leveraging the XDM data model Abel Braaksma (Exselt) |
10:10 | Task Abstraction for XPath Derived Languages Debbie Lockett (Saxonica) and Adam Retter (Evolved Binary) |
10:40 | Ex-post rule match selection: A novel approach to XSLT-based Schematron validation David Maus (Herzog August Bibliothek Wolfenbüttel) |
11:00 | Coffee break |
11:30 | DBpedia: Global and Unified Access to Knowledge Graphs Sebastian Hellmann (Universität Leipzig) |
12:00 | Authoring Domain Specific Languages in Spreadsheets Using XML Technologies Alan Painter (HSBC France) |
12:30 | How to configure an editor Martin Middel (FontoXML) |
13:00 | Lunch |
14:30 | Discover the Power of SQF Octavian Nadolu (Syncro Soft/Oxygen XML) and Nico Kutscherauer (data2type) |
15:00 | Tagdiff: a diffing tool for highlighting differences in the tagging of text-oriented XML documents Cyril Briquet (Cyril Briquet consulting) |
15:30 | Merge and Graft: Two Twins That Need To Grow Apart Robin La Fontaine and Nigel Whitaker (DeltaXML) |
16:00 | Coffee Break |
16:30 | The Design and Implementation of FusionDB Adam Retter (Evolved Binary) |
17:00 | xqerl_db: Database Layer in xqerl Zachary Dean (xqerl) |
17:20 | An XSLT Compiler written in XSLT: can it perform? Michael Kay and John Lumley (Saxonica) |
17:50 | Closing of the day |
19:30 | Social dinner & DemoJam |
Session details
Writing more robust XSLT stylesheets by understanding and leveraging the XDM data model
Abel Braaksma (Exselt)
The XDM, or in full the XQuery and XPath Data Model 3.1, is the all-encompassing data model for processing XML with XQuery, XPath and XSLT.>
This data model can be challenging to understand fully and as a result, many XSLT or XQuery authors write their stylesheets and queries without any type safety, leading to hard-to-diagnose bugs down the line.
This paper attempts to lift the veil, explaining the basics of the data model as an almost-hierarchical model, by comparing it to classic type systems in C/C# and Java emphasizing pitfalls and challenges.
Special attention will be given to overlapping types, the concepts of casting, being not-quite the same as in GP languages, subtype substitution and type promotion. These these three concepts, will be explained example with XSLT stylesheets. This will ultimately lead to a more stable design and more readable programming code.
Finally, the differences in how the data model applies to a stylesheet and its XML processing for schema-aware processors will be addressed, and how one stylesheet may work in a basic processor but not in a schema-aware processor, and why.
Task Abstraction for XPath Derived Languages
Debbie Lockett (Saxonica) and Adam Retter (Evolved Binary)
XPDLs (XPath Derived Languages) such as XQuery and XSLT have been pushed beyond the envisaged scope of their designers, perversions such as processing Binary Streams, File System Navigation, and Asynchronous Browser DOM Mutation have all been witnessed. Many of these novel applications of XPDLs intentionally incorporate, non-sequential and/or concurrent evaluation, and embrace side-effects to achieve their purpose. This paper examines current approaches used in XPDL implementations, reviews solutions offered by non-XPDLs, and then presents a novel solution for XPDLs – EXPath Tasks./p>
» Read paper in proceedings
» slides
Ex-post rule match selection: A novel approach to XSLT-based Schematron validation
David Maus (Herzog August Bibliothek Wolfenbüttel)
SchXslt is a Schematron processor written entirely in XSLT. It follows the principle design of Rick Jelliffe’s “skeleton” implementation and compiles a Schematron 2016 schema to an XSLT stylesheet. The goal of the SchXslt project is improving the validation performance by utilizing the xsl:next-match instruction to implement a new validation strategy. This paper discusses the principle design of an XSLT-based Schematron processor and introduces ex-post rule match selection as a novel validation strategy.
» Read paper in proceedings
» slides
DBpedia: Global and Unified Access to Knowledge Graphs
Sebastian Hellmann (Universität Leipzig)
DBpedia is a decentralized community-driven effort which aims to provide multilingual, open and structured knowledge extracted from Wikipedia. DBpedia is language driven and over the past decade, the DBpedia network has grown to 22 DBpedia language chapters, including major European language such as English, German and French, but also minor such as Czech, Polish and Bulgarian. The network also spans beyond the boundaries of the European Union and it currently provides DBpedia chapters for Japanese, Indonesian, Korean, Russian, Ukrainian and Arabic. Over the years, DBpedia has become one of the most widely used datasets and one of the central interlinking hubs in the Linked Open Data (LOD) cloud. All these efforts have made DBpedia provide “Global and Unified Acess to Knowledge Graphs”.
In the presentation, I walk the audience through the evolution, features, development, deployment, use cases, and benefits of the DBpedia technology, but also open issues and challenges in the context of DBpedia and Linked Open Data. DBpedia has strong roots and relations to XML and XML related technologies, which will be also discussed.
Authoring Domain Specific Languages in Spreadsheets Using XML Technologies
Alan Painter (HSBC France)
Domain Specific Languages (DSLs) have been shown to be useful in the development of information systems. DSLs can be designed to be authored, manipulated and validated by business experts (BEs) and subject matter experts (SMEs). Because BEs are known to be comfortable working with spreadsheet applications, the ability to author DSLs within spreadsheets makes the DSL authoring process even more engaging for BEs. Today’s most popular spreadsheet applications are implemented using XML documents and, for this reason, XML technologies are well suited for reading DSL definitions within Spreadsheets. What is usually considered the part of DSL implementation that requires the most effort is the artifact or code generation from the DSL description. XML technologies are well placed for generating these technical artifacts. In this paper, I will first motivate the usage of DSLs by describing some of their utility in information systems development. I’ll then go on to describe how XML Technologies can be used for reading DSLs within spread- sheets and for generating technical artifacts. To conclude, I’ll present some real world examples of DSL usage via XML Technologies.
» Read paper in proceedings
» slides
How to configure an editor
Martin Middel (FontoXML)
A case study of how the XML manipulation layer of FontoXML has grown. This paper will discuss a number of key decisions we’ve made and where the XML manipulation layer of FontoXML will move to. We hope that this paper will give an insight in how a small team of JavaScript developers with a medium knowledge of XML technologies made the platform on which a large number of XML editors have been and are being built.
» Read paper in proceedings
» slides
Discover the Power of SQF
Octavian Nadolu (Syncro Soft/Oxygen XML) and Nico Kutscherauer (data2type)
In this presentation, you will discover the new additions for the SQF language and how you can create useful and interesting quick fixes for the Schematron rules defined in your project. It will include examples of quick fixes for several type of projects and using abstract quick fixes to enable easier creation of specific fixes or XSLT code for more complex fixes. You will also discover some use cases for quick fixes that extend your Schematron scope.
» Read paper in proceedings
» slides
Tagdiff: a diffing tool for highlighting differences in the tagging of text-oriented XML documents
Cyril Briquet (Cyril Briquet consulting)
Finding differences in the tagging of two XML documents can be very useful when evaluating the output of tagging algorithms. The tool described in this paper, called tagdiff, is an original attempt to take into account the specifics of text-oriented XML documents (such as the importance of lateral proximity and unimportance of ancestor/child relationships) and algorithmic tagging (lack of validity of tagged XML documents, or XML schema too relaxed to be useful), as opposed to a general purpose diffing tool such as the well-known (but XML-oblivious) diff, or as opposed to more recent tools offering comparison of XML trees, or to tools offering rich graphical user interfaces that are not suited to long lines of XML contents. In practice, the two compared XML documents are split into small typed segments that are aligned with one another and compared. These small typed segments of text or markup are aligned and printed next to one another, into two columns. Effort is made to ease visual inspection of differences.
» Read paper in proceedings
» slides
Merge and Graft: Two Twins That Need To Grow Apart
Robin La Fontaine and Nigel Whitaker (DeltaXML)
Software developers are familiar with merge, for example pulling together changes from one branch into another in a version control system. Graft is a more selective process, pulling changes from selected commits onto another branch. These two processes are often implemented in the same way, but they are not the same, there are subtle but important differences.
Git and Mercurial have different ways to determine the correct ancestor when performing a merge operation. Graft operations use yet another way to determine the ancestor. In the built-in line-based algorithm, the underlying diff3 process is then used. This diff3 algorithm accepts non-conflicting changes and relies on the user to resolve conflicts. We will examine the details of this process and suggest that the symmetrical treatment that diff3 uses is appropriate for merge but not necessarily optimal for the graft operation. We will look at examples of tree-based structures such as XML and JSON to show how different approaches are appropriate for merge and graft.
» Read paper in proceedings
» slides
The Design and Implementation of FusionDB
Adam Retter (Evolved Binary)
FusionDB is a new multi-model database system which was designed for the demands of the current data and information age. FusionDB has a strong XML heritage, indeed one of its models is that of a Native XML store. Whilst at the time of writing there are several Open Source, and at least one commercial, Native XML Database system available, we believe that FusionDB offers some unique properties and its multi-model foundation for storing heterogeneous types of data opens new possibilities for cross-model queries. This paper discusses FusionDB’s raison d’être, issues that we had to overcome, and details its high-level design and architecture.
» Read paper in proceedings
» slides
xqerl_db: Database Layer in xqerl
Zachary Dean (xqerl)
xqerl, an open-source XQuery 3.1 processor written in Erlang, now has an internal database layer for persistently storing XML and JSON data. This paper will discuss the internal workings of the database and future work that is still to be done concerning document and value indexing.
An XSLT Compiler written in XSLT: can it perform?
Michael Kay and John Lumley (Saxonica)
Over the past 18 months we have been working on a new compiler for XSLT, written in XSLT itself. At the time of writing, this is nearing functional completeness: it can handle over 95% of the applicable test cases in the W3C XSLT suite. In this paper we’ll give a brief outline of the structure of this compiler, comparing and constrasting with the established Saxon compiler written in Java.
Having got close to functional completeness, we now need to assess the compiler’s performance, and the main part of this paper will be concerned with the process of getting the compiler to a point where the performance requirements are satisfied.
Because the compiler is, at one level, simply a fairly advanced XSLT 3.0 stylesheet, we hope that the methodology we describe for studying and improving its performance will be relevant to anyone else who has the task of creating performant XSLT 3.0 stylesheets.