Parsers and Namespaces¶
- class importer.parsers.BooleanElement(*args, true_value: str = '1', false_value: str = '0', **kwargs)[source]¶
Represents an element which contains a true or false value.
The actual value in the XML by default is assumed to be a 1 for True and a 0 for False. This can be customised by passing in different values.
<msg:some.value>1</msg:some.value> <msg:some.value>0</msg:some.value>
- native_type¶
alias of
bool
- class importer.parsers.CompoundElement(tag: importer.namespaces.Tag, *extra_fields: str, separator: str = '|')[source]¶
Represents an element in XML that is actually a concatenation of one or more logical values and separators.
The separator by default is assumed to be a pipe character. The parsed data will always contain a tuple that is the size of the number of expected fields (the original field and any extras) – if less than the specified number of separators occur the rightmost fields will have value
None
.<msg:some.value>one|two|three</msg:some.value>
- native_type¶
alias of
tuple
- class importer.parsers.ConstantElement(tag: importer.namespaces.Tag, value: str)[source]¶
Represents an element that is always a constant value in the XML.
The actual value is ignored and not put into the database. The value specified in the constructor will be put back into the XML.
- class importer.parsers.ElementParser(tag: importer.namespaces.Tag = None, many: bool = False, depth: int = 1)[source]¶
Base class for element specific parsers.
ElementParser classes uses introspection to build a lookup table of child element parsers to their output JSON field name.
This allows 2 options for adding child elements to a Parent element.
Option 1:
class ChildElement(ElementParser): tag = Tag("child", prefix="ns") field = TextElement("field") class ParentElement(ElementParser): tag = Tag("parent", prefix="ns") child = ChildElement()
Option 2:
class ParentElement(ElementParser): tag = Tag("parent", prefix="ns") @ParentElement.register_child("child") class ChildElement(ElementParser): tag = Tag("child", prefix="ns") some_field = TextElement("field")
When handling XML such as:
<ns:parent> <ns:child id="2"> <ns:field>Text</ns:field> </ns:child> </ns:parent>
This class will build a JSON object in self.data with the following structure:
{"child": {"id": 2, "field": "Text"}}
- is_parser_for_element(parser: importer.parsers.ElementParser, element: xml.etree.ElementTree.Element) bool [source]¶
Check if the parser matches the element.
- record_code: str¶
The type id of this model’s type family in the TARIC specification.
This number groups together a number of different models into ‘records’. Where two models share a record code, they are conceptually expressing different properties of the same logical model.
In theory each
Transaction
should only contain models with a singlerecord_code
(but differingsubrecord_code
.)
- start(element: xml.etree.ElementTree.Element, parent: importer.parsers.ElementParser = None)[source]¶
Handle the start of an XML tag. The tag may not yet have all of its children.
We have a few cases where there are tags nested within a tag of the same name.
Example:
<oub:additional.code> <oub:additional.code.sid>00000001</oub:additional.code.sid> <oub:additional.code.type.id>A</oub:additional.code.type.id> <oub:additional.code>AAA</oub:additional.code> <oub:validity.start.date>2021-01-01</oub:validity.start.date> </oub:additional.code>
In this case matching on tags is not enough and so we also need to keep track of whether this parser is already parsing an element. If it is, we don’t want to select any child parsers. If it is not, we know that this is an element that this parser should be parsing.
- subrecord_code: str¶
The type id of this model in the TARIC specification. The
subrecord_code
when combined with therecord_code
uniquely identifies the type within the specification.The subrecord code gives the intended order for models in a transaction, with comparatively smaller subrecord codes needing to come before larger ones.
- class importer.parsers.IntElement(*args, format: str = 'FM99999999999999999999')[source]¶
Represents an element which contains an integer value.
<msg:record.code>430</msg:record.code>
- native_type¶
alias of
int
- class importer.parsers.RangeLowerElement(tag: importer.namespaces.Tag = None, many: bool = False, depth: int = 1)[source]¶
Represents an element that is the lower part of a range.
- class importer.parsers.RangeUpperElement(tag: importer.namespaces.Tag = None, many: bool = False, depth: int = 1)[source]¶
Represents an element that is the upper part of a range.
- class importer.parsers.TextElement(tag: importer.namespaces.Tag = None, many: bool = False, depth: int = 1)[source]¶
Represents an element which contains a text value.
<msg:record.code>Example Text</msg:record.code>
- native_type¶
alias of
str
- class importer.parsers.ValueElementMixin[source]¶
Provides a convenient way to define a parser for elements that contain only a text value and have no attributes or children.
- native_type: type¶
The Python type that most closely matches the type of the XML element.
- class importer.parsers.Writable[source]¶
A parser which implements the Writable interface can write its changes to the database.
Not all TARIC3 elements correspond to database entities (particularly simple text elements, but also envelopes and app.messages).
- create(data: Mapping[str, Any], transaction_id: int)[source]¶
Preps the given data as a create record and submits it to the nursery for processing.
Provides dataclasses and config classes for xml elements and the taric schema.
- class importer.namespaces.SchemaTagsBase[source]¶
Provides a base dataclass for schema element tag definitions.
- class importer.namespaces.Tag(name: str, prefix: str = 'ns2', nsmap: Dict[str, str] = <factory>)[source]¶
A dataclass for xml element tags.
name
corresponds to the name attribute of the Element element in the XML Schema.prefix
reflects namespace prefixes defined in the taric3 and envelope xsd-s.nsmap
this is a prefix-namespace mapping in the format required by xml.etree.ElementTree- first(parent: xml.etree.ElementTree.Element) xml.etree.ElementTree.Element [source]¶
Returns the first descendant of the parent matching this tag’s name.
- property is_pattern: bool¶
Returns true if the tag name is a regex pattern.
- iter(parent: xml.etree.ElementTree.Element) Iterator[xml.etree.ElementTree.Element] [source]¶
Returns an iterator of descendants of the parent matching this tag’s name.
- property namespace: str¶
Returns the namespace for the tag.
- property pattern¶
Returns a compiled regex pattern.
- property prefixed_name: str¶
Returns the prefixed element tag.
- property qualified_name: str¶
Returns a fully qualified element tag.
- importer.namespaces.make_schema_dataclass(xsd_schema_paths: Dict[str, str]) importer.namespaces.TTags [source]¶
Returns a dynamic dataclass with taric schema element tag definitions.
- importer.namespaces.xsd_schema_paths: Dict[str, str] = (('env', PosixPath('/home/runner/work/tamato/tamato/common/assets/envelope.xsd')), ('oub', PosixPath('/home/runner/work/tamato/tamato/common/assets/taric3.xsd')))¶
Define additional groups in the below dictionary for use as a record_group argument to importer.chunker.chunk_taric.
Check importer.forms.UploadTaricForm.save for example usage when users check the ‘Commodities Only’ box in /importers/create.
The only group defined at the moment is commodities, which is easily extensible to additional record groups.