4. XML Import and Export

Date: 2020-05-07

Status

Approved, partially superseded by ADR-0009.

Context

The Tariff Management Tool (TaMaTo) must publish an up-to-date UK tariff for reference by HMRC, Jersey and Guernsey border control agencies from January 1st 2021.

There is a requirement to produce a TARIC 3 Full Extraction of the UK tariff, and parse a Full Extraction of the EU tariff, and not just Differential Extractions.

HMRC et al currently maintain their own copy of the EU tariff and receive differential updates every day in the form of TARIC3 XML documents.

After January 1st 2021 we will still need to process updates to the EU tariff for reference especially with regard to tariffs on imports into Northern Ireland.

HMRC also publishes tariff measures (VAT, etc) in the form of TARIC3 XML documents which need to be incorporated into the UK tariff.

EU TARIC3 Differential Extractions are assigned a sequence number in the form YYnnn, where YY is the last two digits of the current year and nnn is a per-year sequence number This sequence starts from 001 for the first extraction of the year and is incremented by 1 for each subsequent extraction.

Modification records to the tariff are grouped into transactions of which there may be several in an extraction. Transactions have numeric identifiers which must be unique within the extraction and in ascending order but not necessarily contiguous.

Records are grouped into messages within a transaction of which there may be several. Messages also have numerical sequence identifiers which increment by 1 for each message in the extraction (not just the transaction).

Records themselves have numeric sequence identifiers which increment by 1 for each record in the extraction.

Decision

We will build separate modules for ingesting TARIC3 Extractions of the EU tariff and for producing TARIC3 Extractions of the UK tariff.

Extraction

The Extraction app will be a command line script, executed by a cron job (or similar scheduling system) on a daily interval. It will produce TARIC3 XML documents with the Jinja2 templating engine.

Transaction, Message and Record sequence identifiers will be generated as part of the Jinja2 template rendering process. We will record the stored identifiers.

Extraction sequence numbers (envelope IDs) will be generated by the application on each run based on the data being exported.

The output of the app will be validated against the TARIC3 schema and any validation errors will be reported to the development team via some method. It is assumed that the records in the extraction have already been validated against the business rules by TaMaTo.

The output of the app will (if valid) be uploaded to AWS S3 and made available to HMRC etc via SFTP with AWS Transfer.

Ingestion

The Ingestion app will be a command line script, executed by a cron job (or similar) on a daily interval. It will parse and validate TARIC3 XML documents downloaded from the EU or HMRC using etree and will store the parsed data in the database.

Transactions in the parsed TARIC3 documents will be converted to TaMaTo work baskets owned by special user accounts and may bypass the usual approval process.

On unsuccessful ingestion of a document, the app will send a failure notification to the development team.

Consequences

Using the API instead of writing and reading from the database directly would reduce coupling and increase cohesion. Validation of business rules can be centralised and not duplicated. However, reading from the API requires handling large data transfers over HTTP (but not over the Internet, unless the API is not running in the same VPC) in addition to the API reading data from the DB, so will add latency, and more code. AMQP may be a better choice.

Using templates to render JSON to XML saves time, effort and resources building an intermediate object model.

The lxml library provides schema validation and incremental parsing of XML documents, which reduces memory requirements.

Ingested data uploaded as workbaskets enables using the TaMaTo UI to display updates to tariff managers in the same way as their own changes.