Skip to content

API Reference

A microframework for exploding your datasets.

Attribs = Dict[str, Any] module-attribute

Type alias for a dictionary of attributes.

Tagged = Tuple[T, Attribs] module-attribute

Generic type alias for the tuple of an element with associated attributes.

FromTagged

Bases: Generic[T1]

Functions of a tagged element.

ToTaggedIter

Bases: Protocol[T2]

__call__(tagged_parent)

Iterate the tagged children of a tagged parent.

Useful for by explode_with.

Parameters:

Name Type Description Default
tagged_parent Tagged[T1]

The tuple of a parent element and associated attributes.

required

Yields:

Type Description
Iterable[Tagged[T2]]

Tagged children; i.e., child elements and associated attributes.

ToAttribIter

Bases: Protocol

__call__(tagged_parent)

Iterate the attributes of children of a tagged parent.

Useful for explode*_df.

Parameters:

Name Type Description Default
tagged_parent Tagged[T1]

The tuple of a parent element and associated attributes.

required

Yields:

Type Description
Iterable[Attribs]

Children attributes.

explode_with dataclass

Bases: ToTaggedIter[T2]

Decorate a FromTagged.ToTaggedIter to merge parent attributes into children's.

__call__(item)

Iterate tagged children while merging parent attributes into each.

pipeline

Bases: _pipeline

Applies a sequence of functions to its input.

Example
>>> p = pipeline(range, for_each(lambda x: x ** 2), sum)
>>> p(4) == sum(x ** 2 for x in range(4))
True

for_each dataclass

Applies a function to each element of the input collection.

Example
>>> list(for_each(lambda x: x ** 2)(range(4)))
[0, 1, 4, 9]

explode_df(df, key_attrib, explode_fn)

Explode a pandas.DataFrame using a FromTagged[T1].ToAttribIter.

Each row in the input DataFrame represents a tagged parent. The indicated key_attrib column is used as the parent element, while all other columns form the parents' tags. The explode_fn is applied to each parent in the DataFrame, and the collection of all children attributes are collected into the output DataFrame.

Parent attributes are not automatically merged into the childrens'. If that is desired the caller is responsible to ensure it, e.g., by decorating the explode_fn with explode_with.

Parameters:

Name Type Description Default
df DataFrame

The input DataFrame

required
key_attrib str

The name of the column used as the parent; should be of type T1

required
explode_fn ToAttribIter

The function used to iterate children attributes

required

Returns:

Type Description
DataFrame

DataFrame of all childrens' attributes

explode_spark_df(df, key_attrib, explode_fn, *, new_cols_schema=None)

A Spark version of explode_df.

key_only(item)

Example
>>> key_only(("key", "value"))
'key'