I’m pleased to report on some important steps forward regarding a couple of specifications that are close to our hearts.
On 27th March, EIOPA published the latest draft of the Solvency II taxonomy, making use of both the January Public Working Draft of the Table Linkbase Specification, and the Taxonomy Package Specification.
Moving to a recent PWD of the Table Linkbase specification is an important step for the development of both the specification and the taxonomy, as it means that the taxonomy draft can benefit from improved tool support, and the specification from real world feedback.
Meanwhile, my colleague Jon Siddle continues to work tirelessly with the XII Rendering Working Group to complete the remaining work on the Table Linkbase Specification. The latest edition of IBR magazine included an article by Jon explaining how the specification expands the boundaries of what can be achieved with XBRL (see p27 of the March edition)
It’s also been an important few days in the world of XBRL for Corporate Actions. The final version of the 2012 Corporate Actions taxonomy was published on Monday as a Taxonomy Package. Just a few days earlier, it was announced that Citi have started using the Corporate Actions taxonomy for filing dividend announcements to the Depository Trust and Clearing Corporate (DTCC).
XBRL International has announced the publication of a new Public Working Draft of the Table Linkbase specification. This specification forms a key component of the Solvency II and CRD IV XBRL reporting projects. This release is the first Public Working Draft since 2011, and represents a significant step forward in the maturity and quality of the specification.
Projects that have looked to adopt the Table Linkbase specification have been held back by a lack of recent public releases of the specification, creating interoperability problems as projects have adopted customised versions of the published schemas and standards.
The latest release of the specification has been driven forward by the efforts of CoreFiling staff, and in particular, Jon Siddle. CoreFiling contributions have included the introduction of an XML serialised “infoset” for defining and testing the conformance of Table Linkbase processors, and the refactoring of the specification into three separate models (Definition, Structural and Rendering) to give a clear separation between syntax and semantics.
These improvements to the foundation of the specification will accelerate the development of the standard towards becoming an XBRL International Recommendation, and will help address the interoperability issues that have beset early adopters of the specification.
In my previous post, I looked at how a lack of clear best practice around the naming of concepts and elements has contributed to the confusion around sign conventions in XBRL. I believe that another contributing factor is that the sign conventions used in financial statements are not trivial, and actually quite subtle.
Let’s take another look at the example from my first post:
| 2011 £’000 |
2010 £’000 |
|
| Turnover | 518 | 498 |
| Cost of sales | (321) | (299) |
| Gross profit | 197 | 199 |
| Administrative expenses | (211) | (105) |
| Operating profit/(loss) | (14) | 94 |
You might also encounter a different presentation of exactly the same data:
| 2011 £’000 |
2010 £’000 |
|
| Turnover | 518 | 498 |
| Cost of sales | 321 | 299 |
| Gross profit | 197 | 199 |
| Administrative expenses | 211 | 105 |
| Operating profit/(loss) | (14) | 94 |
Different jurisdictions seem to converge on one approach or the other, but the point is that either approach is valid. The same is not true in XBRL. When it comes to signs in XBRL, there’s a right way to do it, and a wrong way to do it.
In the above examples, we changed the sign of certain numbers on the statement, but we did not change the meaning. If you change the sign of a number reported in XBRL you will always change the meaning.
When humans read financial statements, they use domain knowledge and context to correctly understand the figures. I know that companies do not usually report a negative cost of sales (domain knowledge), and a quick check of the figures above and below (context) confirms that in neither case are the suppliers paying the company!
XBRL facts are designed to be understood independently, without the need for context or domain knowledge.
To illustrate the issue, imagine the accounts above had an additional line item:
| Taxation | 100 | (50) |
In one year the company paid tax, and in the other it received a tax credit, but which was which? In the context of the first table, I’d expect this to represent a tax credit of 100 and a tax charge of 50, but in the context of the second table, I’d assume the opposite meaning. Without the context, it’s completely ambiguous.
By contrast, the sign to be used in XBRL is completely prescribed. Ask the question, “What was the taxation?” If you answer “100”, then tag a positive number. If you answer “actually, there was a tax credit of 100” then tag a negative number.
In the last two posts we’ve seen that tagging a value with the correct sign in XBRL is easy, provided that:
If you’ve been following XBRL for a while, you might be surprised that I’ve got this far with no mention of balance attributes. We can’t avoid them forever, so in my next post I’ll be looking at whether they have anything to add, or if they merely contribute to the confusion.
In my previous article, I demonstrated a simple technique for getting the correct sign when tagging a number in XBRL. You may have noticed that I was somewhat casual with the notion of concepts having a “name”. If you’re familiar with the details of XBRL, you’ll know that concepts have an “element name” and typically have at least one label. Which of these was I referring to?
It is common practice in XBRL to use the standard label to give a concept a human readable name. The purpose of a name is to unambiguously identify the meaning of a concept, and part of that meaning is the sign convention. Making a profit and making a loss are two very different things, and if the name of the concept doesn’t make it clear which of these things the concept represents, then it’s not a very good name.
Examples of good names would include:
Examples of bad names would include:
(the last one is border line – you might reasonably assume that a positive “change” is an increase, but it’s not explicit, and it’s not the sign convention that you’d expect to see used when displaying the concept on a Cash Flow statement)
A more unconventional name like “(Increase)/Decrease in Accounts Receivable” would also be acceptable but note that this is a different concept to one called “Increase/(Decrease) in Accounts Receivable”.
If the idea that a concept should have a name, and that that name should make it clear what the concept means is sounding a bit obvious, then good – it is obvious!
A concept also needs to have an element name. This serves a different purpose, which is to provide a unique identifier for the concept in an XML document. Human readability is not the primary concern, although most implementations have chosen to use meaningful names (e.g. ProfitBeforeTax), rather than arbitrarily generated identifiers (e.g. “c1234”).
XML imposes some constraints on what constitutes a legal element name, most importantly disallowing spaces and most punctuation. This means that we can’t simply use the standard label as an element name. Most implementations have adopted an approach of taking the standard label, stripping out punctuation and removing some connective words such as “and”. This approach is encouraged by FRTA, although an exact rule is not spelt out.
The approach has the unfortunate side effect of turning clear concept names (i.e. standard labels) into rather more ambiguous element names. For example:
| Concept Name | Element Name |
| Profit/(Loss) | ProfitLoss |
| Increase/(Decrease) in Accounts Receivable | IncreaseDecreaseInAccountsReceivable |
Such names undermine the notion that XBRL concepts have a clear and unalterable meaning, and that that meaning includes the sign convention. I suspect that elements such as the above have caused at least some of the confusion about how signs work in XBRL.
There is a very simple approach that would remove this confusion, but it’s not one that has made it into any published best practice that I am aware of, and that is to drop portions of the label that indicate the negated meaning when forming an element name. For example:
| Concept Name | Element Name |
| Profit/(Loss) | Profit |
| Increase/(Decrease) in Accounts Receivable | IncreaseInAccountsReceivable |
If you’re uneasy about this approach, remember that the element is just a unique identifier. It is not intended to be a descriptive label, so the fact that it does not spell out the meaning of a negative value is unimportant.
In my view, the confusion around signs in XBRL has been fuelled by a number of details of the implementation of XBRL at the SEC. In the SEC implementation, preparers submit not only an instance document, but also an extension taxonomy allowing preparers to customise the taxonomy to better match their financial statements.
The SEC rule (33-9002) that enabled the use of XBRL for SEC Filings, requires filers to change the labels of standard concepts in the US GAAP taxonomy to match those on the company’s financial statements. You can argue about whether that’s a good idea or not, but doing so opens the door to confusion around sign conventions.
The text of the rule gives the example of a company relabeling “Gross Profit” as “Gross Margin” as they are “definitionally the same”. Seems harmless enough, but what about if the line item in your financial statements is “(Increase)/Decrease in Accounts Receivable”? Should you change the standard label of the US-GAAP concept from “Increase/(Decrease) in Accounts Receivable” to “(Increase)/Decrease in Accounts Receivable”? In my view doing so is absolutely unacceptable: an increase in accounts receivable is not the same as a decrease in accounts receivable, so changing the name of a concept in this way is very misleading.
The SEC system does provide an appropriate way to handle this situation (negating labels) but the guidance in the Edgar Filing Manual could be clearer. Rule 6.11.1 instructs filers to “Assign a label of an element used in an instance the same text as the corresponding line item in the original HTML/ASCII document” but nowhere in this rule does it suggest that assigning a standard label that implies the opposite sign convention is unacceptable. 6.11.6 explains how to use negating labels, but does not explain what you should do with the standard label.
I believe that much of the confusion around XBRL sign conventions could be removed by clearly documenting two pieces of best practice:
One of things has continued to surprise me with the adoption of XBRL is the amount of discussion that the question of tagging figures with the correct sign can generate. Brendan Mullan recently managed to start no fewer than 12 separate threads on this topic on the xbrl-public list, some of which resulted in significant further discussion.
Marking up figures in electronic format is not a new phenomenon, and I’m not aware of any other domains that have managed to get so tangled up in sign issues. What is it about the application of XBRL to financial reports that causes such difficulty? I have some ideas, but first let’s look at how to do it right.
Let’s consider the following extract from a Profit and Loss statement:
| 2011 £’000 |
2010 £’000 |
|
| Turnover | 518 | 498 |
| Cost of sales | (321) | (299) |
| Gross profit | 197 | 199 |
| Administrative expenses | (211) | (105) |
| Operating profit/(loss) | (14) | 94 |
There’s a really simple way to get the sign right in XBRL, every time. Simply take the name of the concept that you’re using to tag the figure, and turn it into a question by prefixing it with “What was the… ”
For example, suppose our concept is called “Cost of Sales”.
Question: What was the Cost Of Sales?
Answer: £321,000
Even though the figure in the accounts is shown as “(321)”, you wouldn’t answer that question by saying “minus £321,000”, would you? So we tag a positive number.
On the other hand, if in your answer you need to correct the question, then the sign should be negative:
Question: “What was the Operating Profit?”
Answer: “Actually, there was a loss of £14,000.”
Our answer is the opposite of the question that was asked, so we’d tag a negative number against a concept called “Operating Profit”.
Sometimes the concept name will be more explicit about the sign convention. For example, you might have a concept called “Increase (Decrease) in Accounts Receivable”. In this case, just ignore the bit in brackets, so your question becomes:
Question: “What was the Increase in Accounts Receivable?”
If your answer starts, “actually there was a decrease…” then you should tag a negative number. Otherwise, you should tag a positive number.
It really is that simple. Nothing to do with balance attributes, negated labels, calculation weights or any of that stuff.
There’s a number of reasons why what should have been a really straightforward issue has become confused into something much more complicated. I’ll address these in a series of follow-up articles:
There has been a lot of discussion recently in the XBRL community about the use of XBRL for very large datasets. There are a number of misconceptions around about the practicalities of working with large instances, and some confusion about the extent to which different approaches to processing XBRL can improve performance. This article attempts to shine some light on the problem, and propose ways in which performance could be improved when working with large datasets.
When XML was gaining popularity at the turn of the century, there were many people who complained that it was an inherently inefficient way to work with data. For anyone with experience of packing data into binary structures to minimise storage and memory usage, the idea of using XML tags around text representations of data seemed extremely wasteful.
The reality is that whilst XML is inherently inefficient relative to packed binary formats, or even CSV, computing power and memory usage had reached the point where, for most everyday data sets, the performance implications of this inefficiency were negligible, and were outweighed by the benefits of working with self-describing data that could be processed using standard validators and tools.
As computers have continued to evolve, the cut-off for how much data it is reasonable to handle using XML has increased. For example, the core of many XML applications is the Document Object Model (DOM). Memory requirements for DOM are of the order of ten times the size of the XML document. In a world where computers with several gigabytes of RAM are commonplace, processing XML documents that are tens of megabytes in size has become feasible, but documents that are more than a few hundred megabytes in size remain problematic.
For such datasets, there are essentially two options:
XBRL, being built on XML, suffers from the same inefficiency of representation, and the same challenges in processing. In fact, in many cases, the problems are more acute as XBRL is not particularly efficient in its use of XML. This is particularly noticeable for heavily dimensional data, where each <context> element is only used by a small number of facts.
As noted above, many processors are built around the Document Object Model (DOM), or some other DOM-like interface such as libxml or Python’s lxml1. The key feature of such interfaces is that the XML document is parsed into an in-memory representation allowing random access to all information that was in the XML. A “universal truth” that is often cited by people who know just enough to be dangerous is that “DOM is really inefficent“. Whilst it is true that the memory overhead of the DOM is significant, the question of whether it is an efficient way to solve a particular problem depends on the nature of the problem and what the alternative approaches are.
The standard alternative to a DOM-like approach is a stream-based approach such as SAX. SAX presents an XML document as a series of events, and it is up to the consuming application to extract the useful items of data as the events are received, and typically, store the extracted information in some in-memory representation.
The key to a stream-based approach being more efficient than a DOM-based approach is how much information you store in memory as a result of the parse, and the key to that is whether you can know in advance what subset of information you want from the XML document.
When working with an XBRL document, you generally don’t need all the information that’s in the XML model. What you want is an XBRL model. You don’t want to work in terms of elements and attributes, you want to work in terms facts, concepts, labels, dimensions, etc. In an ideal world you could SAX-parse your XBRL document straight into an XBRL model, and there would be no need for a DOM-style, in-memory representation of the XML.
Unfortunately, we don’t live in an ideal world, and there a few ways in which XBRL clings unhelpfully to its underlying XML representation. The heart of the problem is that there is no well-defined information model for XBRL. There have been various efforts to create one, such as the XBRL Infoset, and more recently the Abstract Model, but none have yet come to fruition. The result of this is that there is no common agreement about which parts of an XML document are “significant” from an XBRL perspective, and which parts are irrelevant syntax-level detail.
A good example of where this creates a practical problem is XBRL Formula’s use of XPath as an expression language. Whilst the primary way of selecting facts for use in a formula is to assign them to variables using the fact selection constructs provided by the specification, XBRL Formula allows formula writers to include arbitrary XPath expressions. In other words, they can work not just with the XBRL, but with the underlying XML. Whilst this makes XBRL Formula very powerful, it means that an XBRL Formula processor is obliged to keep a copy of the XML document in memory in order to support the XML-based navigation required by XPath. In other words, if you want to use XBRL Formula, you’re pretty much stuck with the DOM, or something very much like it.
Another example is in the specification of validation rules. Here at CoreFiling, we’ve got a really nice XBRL model in the form of our True North API, and it makes writing XBRL validation rules really quick and easy. Unfortunately, validation requirements are often specified in terms of XML syntax rather than an XBRL model (this isn’t altogether surprising, given the above-mentioned absence of a commonly agreed XBRL model). A prime example of this is the Edgar Filer Manual, which defines validation criteria for SEC submissions. A quick read of the manual reveals rules specified in terms of XML artefacts such as elements and attributes, and not just XBRL artefacts like facts and concepts. The net result of this is that in order to implement many of these rules accurately, we need to dive behind our nice XBRL model and delve into a lower-level DOM-like model of the XML.
To summarise, in order to work with XBRL more efficiently and allow scaling to much larger instance documents, we need to work with it as XBRL, not XML. We need to introduce the notion of a “Pure XBRL Processor” which is free to discard irrelevant XML syntax.
In order to do this, we first need to define a commonly agreed XBRL model. We can then be clear about which problems can be solved with an efficient Pure XBRL Processor, and which are dependent on a processor with access to the underlying XML.
We then need to revisit technologies such as XBRL Formula and figure out how we can make them work with a Pure XBRL Processor. One option, of course, is to switch to an entirely different technology such as Sphinx which is already built on top of an XBRL model.
Another option is to restrict the XPath expressions that are allowed in XBRL Formula to a subset that can be implemented on top of a Pure XBRL Processor. In other words, retain the ability to access functions and variables, but remove the ability to do arbitrary navigation of the underlying XML document. This would be no bad thing. I spoke recently to XBRL Formula guru Herm Fischer, and he expressed his concern at the number of Formula rules he’d seen that use XPath expressions to navigate the XML model, rather than treating it as XBRL.
I’ve written previously about the risks of trying to treat XBRL as XML. Restricting XBRL Formula so that it can only work with the XBRL Model should lead to better written, more robust XBRL Formulas, and hopefully will guide rule writers away from concerning themselves with irrelevant syntactic details.
Of course, whilst a pure XBRL approach has the potential to use far less memory than one which must retain an XML model, ultimately any in-memory approach is going to have memory requirements that are proportional to document size, and so will always have an upper limit on the size of document that can reasonably be processed on any given hardware. For extremely large instance documents, more radically different approaches to processing will be necessary. Such approaches may well rule out the possibility of using familiar technologies such as Sphinx and Formula altogether. For such documents, moving to a pure XBRL approach is a necessary first step, but it’s not the whole solution.
I’m sure that these suggestions won’t appeal to everyone, but as XBRL moves into the enterprise, we need to free the information from the syntax used to represent it.
1. From this point on, I use the terms “DOM” and “DOM-like” to refer to any approach that stores an in-memory representation of the full XML model. Whilst it’s certainly possible to create DOM-like implementations that are more efficient than an actual DOM implementation, memory usage is still likely to be some multiple of the original document size and so will still suffer from the same fundamental performance limitations.
Charlie Hoffman has added an interesting post to his blog about using Magnify to verify the integrity of a financial report.
Our Magnify XBRL review tool comes built in with a range of generally applicable XBRL quality checks, as well as some jurisdiction-specific filing rules, such as the Edgar Filer Manual and HMRC’s Joint Filing Common Validation Critieria rules, but as Charlie demonstrates, the real power of Magnify comes from the ability to drop in custom rules.
Magnify’s checklist view allows users to build a custom, structured review based on checks that can be implemented in a range of technologies. The fastest way to build rules that operate on the XBRL semantics of a report is Sphinx. We do also support the XBRL International Formula standard, but as Charlie notes, “creating Sphinx rules is much, much easier”.
Charlie’s published the source to the rules that he’s using. Although readable, they look a little bland in this plain text format. Sphinx rules are most easily developed using SpiderMonkey which provides a rules development environment with syntax highlighting, concept drag-and-drop, and on-the-fly syntax validation.
There are a few neat features to note in the rulebase. The first one is these few lines:
transform
namespace "http://xbrl.us/us-gaap/2009-01-31"
to "http://fasb.org/us-gaap/2011-01-31"
transform
namespace "http://xbrl.us/dei/2009-01-31"
to "http://xbrl.sec.gov/dei/2011-01-31"
These two “transform” statements make all of the rules in the rulebase, which are written against the 2011 US GAAP taxonomy, also work with the 2009 US GAAP taxonomy. Once it’s published, two more lines will extended them to work with the 2012 taxonomy. Obviously this depends on the relevant concepts existing in both versions of the taxonomy, but where they don’t you can add some additional, more granular, transform statements to provide the necessary mappings. What’s more, if you happen to have an XBRL Versioning Report, you can easily generate the necessary transform statements.
Another thing to note about the rules is that they contain everything needed to generate the checklist that Charlie includes in the screenshot. Our validation platform is about more than just defining and executing validation rules. It’s about building a powerful and intuitive review environment:

For several hours this morning (UK time) the www.xbrl.org website was unavailable. You might think that this was of little consequence, until you realise that, consistent with XBRL best practice, HMRC’s guidance for company accounts requires that UK GAAP filings reference the UK GAAP taxonomy at its canonical location of http://www.xbrl.org/uk/gaap/core/2009-09-01/uk-gaap-full-2009-09-01.xsd using a <schemaRef> element. The XBRL 2.1 specification requires that XBRL processors resolve and discover the taxonomy documents referenced by such <schemaRef> elements. As such, out-of-the-box XBRL software following the rules of the specification couldn’t process UK GAAP instance documents during the outage this morning, and for anyone trying to use such software to create or review the accounts for their Corporation Tax return, this was a problem.
A similar issue existed for other UK taxonomies, such as UK-IFRS, and indeed, any of the many other taxonomies hosted on the xbrl.org website.
As noted in my earlier post, most XBRL software already has some mechanism for configuring local copies of taxonomies so that processing is not dependent on your internet connection or third party websites. Unfortunately, configuring such offline copies isn’t particularly easy. This is where taxonomy packages can help, as they contain all the information necessary to set up an offline copy of a particular taxonomy.
As XBRL becomes an important part of everyday business, ensuring that XBRL processes are implemented in a robust manner becomes essential. Taxonomy Packages can make doing that just a little bit easier.
Following on from my previous post on Taxonomy Packages, Eric Cohen got in touch with an example taxonomy package for XBRL GL.
You can download the sample here: XBRL-GL-PR-2010-04-12-package.zip
This is a sample for testing purposes only, based on the official XBRL Global Ledger Taxonomy. The taxonomy is subject to the standard XBRL International Copyright and Licence.
Many of the standards that we deal with in the XBRL world are fearsomely complicated, take years to develop, and enable new and exciting ways of working.
This post is about a proposed standard that is very simple, took only a few hours to develop and which is just intended to make working with XBRL that little bit easier.
Taxonomies are a key part of XBRL. They typically consist of many files, hosted on a website somewhere, which are then referenced by the instance documents or extension taxonomies that use them. This creates two practical problems for people working with taxonomies.
Problem 1: Finding the Entry Points
Over time taxonomies have become increasingly complicated, and modular taxonomies consisting of tens, if not hundreds of files have now become the norm. In such a modular taxonomy, only a handful of those files are typically considered to be “entry points”, that is, files from which you would start the DTS discovery process.
For example, the full 2009 UK GAAP, IFRS, Banking and Charities taxonomies ZIP file consists of 603 files, but contains just four primary entry points. These are described in Word documents included in the ZIP file, which means in order to start working with the taxonomy I need to:
Wouldn’t life be just that little bit easier if I could just point my XBRL software at the ZIP, be presented with a list of the four entry points (with sensible, human readable descriptions), and then just open what I wanted? Something more like this:
Problem 2: Offline working
XBRL taxonomies are typically published on publicly available web servers, and then referenced by instance documents using an absolute URL. An XBRL processor consuming such a document will then follow the URL and download the files that make up the taxonomy as required. This creates two potential issues. Firstly, it means that you need an internet connection in order to process the document. Secondly, taxonomies are big (the UK taxonomies are made up of over 50MB of XML files) so you need a fast internet connection.
In order to support offline work, and to improve performance, you really want to be working with offline copies of taxonomies, rather than constantly downloading them from the web. Most XBRL software already provides some mechanism for working with offline copies of taxonomies.
At its simplest, software can just cache copies of taxonomies as it uses them, although that means that you’ve got to use it once before it becomes available for offline use, and the cache may be subject to an expiry policy to limit its size. In many cases it’s desirable to control explicitly which taxonomies are going to be stored locally, but this is often cumbersome to configure as you need to provide not only a copy of the taxonomy but also a “remapping” or “redirection” that specifies what public locations should be remapped to your local copy.
Wouldn’t life be just that little bit easier if I could just give my XBRL software a ZIP of the taxonomy, and it would configure itself for offline use, so that instance documents referencing that taxonomy would then “just work”?
The solution: Taxonomy Packages
Taxonomy Packages are a simple solution to the above problems that require only a minimal change in the way that taxonomies are currently distributed. Most taxonomies are already made available as ZIP files, containing all the files that make up the taxonomy. A Taxonomy Package is simply a ZIP file with an extra XML file dropped into it.
The XML file, called .taxonomyPackage.xml, provides a list of the entry points within the taxonomy, along with names and descriptions. The .taxonomyPackage.xml file also contains generic name, description and version meta-data about the taxonomy as a whole, enabling taxonomy distributions to be self-documenting. All names and descriptions have support for multi-language alternatives.
The other component of the .taxonomyPackage.xml is a set of remappings, that allow the contents of the ZIP file to be treated as if they were hosted at an internet location. At its simplest, a remapping could take the form of remapping the prefix “http://www.xbrl.org/uk/” to a directory within the ZIP file. This tells a processor that every time it encounters a URL starting with “http://www.xbrl.org/uk/” it should try to resolve it to an equivalently named file within the taxonomy package ZIP.
The format of the .taxonomyPackage.xml file has been kept as simple as possible. We’ve published some samples, and of course, a schema for the file format:
We will be publishing a simple spec on the details of how these files are to be processed just as soon as we’ve had a chance to write them down.
Working with Taxonomy Packages
As you will see from the comment at the top of the schema, we’re making this available under a Creative Commons licence that allows free use of the format (including for commercial purposes). Our hope is that the XBRL community will agree that this is a simple solution to a simple problem, and if we adopt a common solution then XBRL will become that little bit easier to work with, and just a little bit less intimidating for end users.
We’re actively introducing support for taxonomy packages into our products. The recent Magnify and SpiderMonkey 1.27 releases have support for opening packages, and SpiderMonkey 1.27 also has support for creating them.
If you would like more information on taxonomy packages, or would like to see a taxonomy package sample for your taxonomy, please drop me an email.