gfw.common.beam.transforms.GroupBy#

class GroupBy(*fields, elements='', dict_fields=True, **kwargs)[source]#

Wrapper around beam.GroupBy with automatic labeling.

This transform wraps Beam’s native beam.GroupBy and adds an automatically generated label based on the grouping keys. For example, grouping by [“user”, “country”] with elements=”Sessions” results in a label like GroupSessionsByUserAndCountry.

If dict_fields=True (default), string positional fields are interpreted as dictionary keys and wrapped with operator.itemgetter(). If False, strings are treated as attribute names.

Example

pcoll | GroupBy("user", "country", elements="Sessions")
Parameters:
  • *fields (Any) – Positional key fields to group by. If these are strings and dict_fields=True, they will be interpreted as dictionary keys.

  • elements (str) – A human-readable label describing the grouped elements (e.g., Messages or Sessions). It is used to generate the step label.

  • dict_fields (bool) – If True (default), string fields are interpreted as dictionary keys and wrapped with operator.itemgetter(). Set to False to use Beam’s default behavior (attribute access).

  • **kwargs (Any) – Same as beam.GroupBy interface.

Methods

annotations

create_label

Generate a descriptive label for the GroupBy transform based on keys and elements.

default_label

default_type_hints

display_data

Returns the display data associated to a pipeline component.

expand

Applies the wrapped Beam GroupBy transform to the input PCollection.

from_runner_api

get_resource_hints

get_type_hints

Gets and/or initializes type hints for this object.

get_windowing

Returns the window function to be associated with transform's output.

infer_output_type

register_urn

runner_api_requires_keyed_input

to_runner_api

to_runner_api_parameter

to_runner_api_pickled

type_check_inputs

type_check_inputs_or_outputs

type_check_outputs

with_input_types

Annotates the input type of a PTransform with a type-hint.

with_output_types

Annotates the output type of a PTransform with a type-hint.

with_resource_hints

Adds resource hints to the PTransform.

Attributes

label

pipeline

side_inputs

classmethod create_label(keys, elements)[source]#

Generate a descriptive label for the GroupBy transform based on keys and elements.

Constructs a label string combining the human-readable element description and the grouping keys, formatted in a CamelCase style joined by ‘And’.

For example, keys ['user', 'country'] and elements ‘Sessions’ result in GroupSessionsByUserAndCountry.

Parameters:
  • keys (Sequence[str]) – A sequence of key field names used for grouping.

  • elements (str) – A human-readable label describing the grouped elements.

Returns:

A formatted string label for use as the PTransform’s step label.

Return type:

str

expand(pcoll)[source]#

Applies the wrapped Beam GroupBy transform to the input PCollection.

Return type:

PCollection