Skip to content

Transform

matchbox.common.transform

Functions to transform data between tabular and graph structures.

Classes:

  • DisjointSet

    Disjoint set forest with “path compression” and “union by rank” heuristics.

Functions:

  • hash_cluster_leaves

    Canonical method to convert list of cluster IDs to their combined hash.

DisjointSet

DisjointSet()

Bases: Generic[T]


              flowchart TD
              matchbox.common.transform.DisjointSet[DisjointSet]

              

              click matchbox.common.transform.DisjointSet href "" "matchbox.common.transform.DisjointSet"
            

Disjoint set forest with “path compression” and “union by rank” heuristics.

This follows implementation from Cormen, Thomas H., et al. Introduction to algorithms. MIT press, 2022

Methods:

  • add

    Add a new element to the disjoint set.

  • union

    Merge the sets containing elements x and y.

  • get_components

    Return the connected components of the disjoint set.

Attributes:

parent instance-attribute

parent: dict[T, T] = {}

rank instance-attribute

rank: dict[T, int] = {}

add

add(x: T) -> None

Add a new element to the disjoint set.

union

union(x: T, y: T) -> None

Merge the sets containing elements x and y.

get_components

get_components() -> list[set[T]]

Return the connected components of the disjoint set.

hash_cluster_leaves

hash_cluster_leaves(leaves: list[bytes]) -> bytes

Canonical method to convert list of cluster IDs to their combined hash.