gfw.common.beam.transforms.ReadAndDecodeFromPubSub#

class ReadAndDecodeFromPubSub(subscription_id, project, with_attributes=True, decode=True, decode_method='utf-8', read_from_pubsub_factory=<class 'apache_beam.io.gcp.pubsub.ReadFromPubSub'>, **read_from_pubsub_kwargs)[source]#

Wrapper around ReadFromPubSub with optional decoding.

It supports the following features:
  • Reading from a specific Pub/Sub subscription.

  • Optionally including message attributes.

  • Decoding message data using a specified method (default is UTF-8).

  • Allowing the use of a custom or mocked beam.ReadFromPubSub transform for testing purposes.

Parameters:
  • subscription_id (str) – The Pub/Sub subscription id from which to read messages.

  • with_attributes (bool) – Whether to include attributes in the Pub/Sub message. Default is True.

  • decode (bool) – Whether to decode the data from bytes to dictionary. Default is True.

  • decode_method (str) – The method used to decode the message data. Supported methods include standard encodings like utf-8, ascii, etc. Default is utf-8.

  • read_from_pubsub_factory (Callable[[...], ReadFromPubSub]) – A factory function to create a ReadFromPubSub instance. This is useful for testing when a custom or mocked beam.ReadFromPubSub implementation is needed. Default is the Beam beam.ReadFromPubSub class.

  • **read_from_pubsub_kwargs (Any) – Additional keyword arguments passed to the ReadFromPubSub transform. These can be used to specify custom parameters for the reading operation.

Methods

annotations

default_label

default_type_hints

display_data

Returns the display data associated to a pipeline component.

expand

Applies the transform to the pipeline root and returns a PCollection of messages.

from_runner_api

get_client_factory

Returns a factory for ReadFromPubSub objects.

get_resource_hints

get_type_hints

Gets and/or initializes type hints for this object.

get_windowing

Returns the window function to be associated with transform's output.

infer_output_type

register_urn

runner_api_requires_keyed_input

to_runner_api

to_runner_api_parameter

to_runner_api_pickled

type_check_inputs

type_check_inputs_or_outputs

type_check_outputs

with_input_types

Annotates the input type of a PTransform with a type-hint.

with_output_types

Annotates the output type of a PTransform with a type-hint.

with_resource_hints

Adds resource hints to the PTransform.

Attributes

SUBSCRIPTION

label

pipeline

side_inputs

subscription

Generates the full subscription path from project and subscription id.

classmethod get_client_factory(mocked=False)[source]#

Returns a factory for ReadFromPubSub objects.

Return type:

Callable

property subscription: str#

Generates the full subscription path from project and subscription id.

expand(pcoll)[source]#

Applies the transform to the pipeline root and returns a PCollection of messages.

Parameters:

pcoll (PCollection) – An input PCollection. This is expected to be a PBegin when used with a real or mocked ReadFromPubSub, since Pub/Sub sources begin from the pipeline root.

Returns:

A PCollection of dictionaries where each dictionary contains: - “data”: the decoded message string (if decoding is enabled), - “attributes”: a dictionary of message attributes (if available).

Return type:

beam.PCollection