Skip to content

Compiling

bseddon edited this page Nov 10, 2016 · 12 revisions

A challenge with using XBRL to present an instance document information is the need to parse the taxonomy. Taxonomies can be huge and often there are several linked in a network. Taxonomies are XML documents that are structured following strict XBRL rules but they can take considerable time to parse making reporting slow. This project addresses this issue by making it possible to 'compile' a taxonomy into a format that contains all the information available in the source taxonomy but can be loaded much, much quicker.

**How to compile a taxonomy?**

It’s pretty straight forward. The three lines below show how to compile the UK GAAP taxonomy.

require_once( 'XBRL.php' );
XBRL::compile( 
	'/uk-gaap-2009-09-01/uk-gaap-main-2009-09-01.xsd', 
	XBRL_UK_GAAP::$uk_GAAP_FULL_NS,
	'uk-gaap'
);

The XBRL class is implemented in XBRL.php so this file is required. When this file is loaded it will take care of auto-loading any other related files. 'Compile' is a static function of the XBRL class. It takes three parameters: a taxonomy schema file; the namespace of the taxonomy in this taxonomy set to use; and the name to use as the base of the generated files.

In this case the taxonomy is a local file reference but the parameter can also be a URL. The taxonomy referenced by the first parameter may, in turn, reference other taxonomies and it may be one of the referenced taxonomies that you want to report against. For example, in the UK GAAP taxonomy the 'main' taxonomy file does not contain the presentation linkbase. The presentation linkbase is contained in the ‘full’ taxonomy which is imported by the main taxonomy. In this scenario, the second parameter can be the namespace of the referenced taxonomy that includes the presentation linkbase required.

The compilation process generates a JSON file. It takes approximately 10 seconds to compile the UK GAAP taxonomy on an AWS ‘t2.medium’ instance. A t2.medium instance is approximately a dual core 1.7GHz Xeon with 4GB memory. The resulting JSON file is 19.4MB (Zip 1.35MB). By comparison, the compiled taxonomy takes only 200-210 milliseconds to load.

Unlike the original taxonomy which is highly normalized, the in-memory representation of the taxonomy, and the generated JSON, is consistently structured but highly de-normalized. For example, in the original taxonomy hypercubes and related dimensions/members are defined one time in the definition link base. However, the JSON file the hypercube dimensions and members are assigned to all primary items used in the presentation linkbase to which they apply.

Processing an instance document against the loaded taxonomy and saving a report in HTML takes a further 390-400ms. A total of 590-610ms to create a report and save a report.

**What is compiling**

Compiling is the act of processing an XBRL taxonomy set to produce a JSON file. This section reviews that process.

The core processing of this source is similar to any other XBRL processor except the focus is not on validation. Just about any other XBRL processor focuses on validating taxonomies or instance documents. This compile process assumes the taxonom(y|ies) being processed are valid. Issues with the integrity of the taxonomy will be reported if encountered but that's not the focus of this project.

**Q. Why compile? A. Performance**

The compiled taxonomy details are held in nested, indexed arrays. Except where using object oriented features simplifies the code, the code is organized to use arrays and nested arrays. Arrays are used instead of more granular classes to ensure the code performs as quickly as possible. The overriding objective is being able to load a taxonomy quickly and process instance documents against the presentation link base as speedily as possible. PHP is great at handling arrays and is able to process arrays more speedily than classes. It is possible to improve the innate efficiency of PHP's handling of arrays by using array references so PHP does not need to copy the contents of nested arrays in order for the executing code to be able to directly access those nested arrays. Also, arrays can be persisted to and restored from JSON, which in turn can be stored in a file efficiently.

It turns out that classes are persisted in JSON as arrays but there is no simple way to restore classes from arrays because they are deserialized as StdObject instances not classes with their original types (sure you can write code but the overriding objective is performance). Using classes can mean tools like Eclipse are able to use reflection to learn about the structure of classes and make the life of the developer easier. However, the objective is performance. Productivity of the developer a less significant concern.

**Compiling: The process**

You can see the steps taken to process a taxonomy in the private function 'loadTaxonomy' of the class XBRL. Any referenced external taxonomies (imported schemas) are processed first if they have not already been processed (they've been processed if they already exist the list of taxonomies in the XBRL_Global singleton class). This is so locator references to other taxonomies can be resolved when processing link bases. In summary, and using a sort of psuedo-code to describe the high-level steps, here is an overview of the compile process:

XBRL::Compile( $xsd, $namespace, $filename )
	XBRL::Load_taxonomy( $xsd, true )
		XBRL::withTaxonomy( $xsd, true )
			Set Context
			Initialize Cache
			Read XSD
			loadTaxonomy()

--> loadTaxonomy (recursive)
|		Import Schemas
|		Index Complex Types
|		Index Elements
|		Create Role Types List
|		Create Linkbase Refs list
|		Process Linkbases
|			Process Presentation Linkbase
|			Process Label Linkbase
|			Process Definition Linkbase
|		Assign hypercubes Presentation hierarchy elements
|
|	Import Schemas
|		For each schema
--------	loadTaxonomy()	
**!!! Caution !!!**

The term label is ambiguous in XBRL when used without context. A label can be text added by the taxonomy author to provide the user with a description for an element. However, XBRL is built from XML specifications including the xlink specification. This specification also uses the term 'label' for a different purpose. In this context it's more like a 'key' or 'identifier'. So it is important to be clear when reviewing the code and considering anything called a label whether this is something that could otherwise be called a 'description' or could otherwise be called an 'identifier'. Confusingly, an xlink label maybe used as an identifier for an arc from an element to an XBRL label.

Clone this wiki locally