API Reference
A microframework for exploding your datasets.
Attribs = Dict[str, Any]
module-attribute
Type alias for a dictionary of attributes.
Tagged = Tuple[T, Attribs]
module-attribute
Generic type alias for the tuple of an element with associated attributes.
FromTagged
Bases: Generic[T1]
Functions of a tagged element.
ToTaggedIter
Bases: Protocol[T2]
__call__(tagged_parent)
Iterate the tagged children of a tagged parent.
Useful for by explode_with
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tagged_parent
|
Tagged[T1]
|
The tuple of a parent element and associated attributes. |
required |
Yields:
Type | Description |
---|---|
Iterable[Tagged[T2]]
|
Tagged children; i.e., child elements and associated attributes. |
ToAttribIter
Bases: Protocol
explode_with
dataclass
Bases: ToTaggedIter[T2]
Decorate a FromTagged.ToTaggedIter
to merge parent attributes into children's.
__call__(item)
Iterate tagged children while merging parent attributes into each.
pipeline
Bases: _pipeline
Applies a sequence of functions to its input.
Example
>>> p = pipeline(range, for_each(lambda x: x ** 2), sum)
>>> p(4) == sum(x ** 2 for x in range(4))
True
for_each
dataclass
Applies a function to each element of the input collection.
Example
>>> list(for_each(lambda x: x ** 2)(range(4)))
[0, 1, 4, 9]
explode_df(df, key_attrib, explode_fn)
Explode a pandas.DataFrame
using a FromTagged[T1].ToAttribIter
.
Each row in the input DataFrame
represents a tagged parent.
The indicated key_attrib
column is used as the parent element, while
all other columns form the parents' tags.
The explode_fn
is applied to each parent in the DataFrame
,
and the collection of all children attributes are collected into the output DataFrame
.
Parent attributes are not automatically merged into the childrens'.
If that is desired the caller is responsible to ensure it, e.g., by
decorating the explode_fn
with explode_with
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The input DataFrame |
required |
key_attrib
|
str
|
The name of the column used as the parent; should be of type |
required |
explode_fn
|
ToAttribIter
|
The function used to iterate children attributes |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
|
explode_spark_df(df, key_attrib, explode_fn, *, new_cols_schema=None)
A Spark version of explode_df
.
key_only(item)
Example
>>> key_only(("key", "value"))
'key'