gfw.common.beam.transforms.ReadAndDecodeFromPubSub#
- class ReadAndDecodeFromPubSub(subscription_id, project, with_attributes=True, decode=True, decode_method='utf-8', read_from_pubsub_factory=<class 'apache_beam.io.gcp.pubsub.ReadFromPubSub'>, **read_from_pubsub_kwargs)[source]#
Wrapper around
ReadFromPubSubwith optional decoding.- It supports the following features:
Reading from a specific Pub/Sub subscription.
Optionally including message attributes.
Decoding message data using a specified method (default is
UTF-8).Allowing the use of a custom or mocked
beam.ReadFromPubSubtransform for testing purposes.
- Parameters:
subscription_id (str) – The Pub/Sub subscription id from which to read messages.
with_attributes (bool) – Whether to include attributes in the Pub/Sub message. Default is True.
decode (bool) – Whether to decode the data from bytes to dictionary. Default is True.
decode_method (str) – The method used to decode the message data. Supported methods include standard encodings like
utf-8,ascii, etc. Default isutf-8.read_from_pubsub_factory (Callable[[...], ReadFromPubSub]) – A factory function to create a
ReadFromPubSubinstance. This is useful for testing when a custom or mockedbeam.ReadFromPubSubimplementation is needed. Default is the Beambeam.ReadFromPubSubclass.**read_from_pubsub_kwargs (Any) – Additional keyword arguments passed to the
ReadFromPubSubtransform. These can be used to specify custom parameters for the reading operation.
Methods
annotationsdefault_labeldefault_type_hintsReturns the display data associated to a pipeline component.
Applies the transform to the pipeline root and returns a PCollection of messages.
from_runner_apiReturns a factory for
ReadFromPubSubobjects.get_resource_hintsGets and/or initializes type hints for this object.
Returns the window function to be associated with transform's output.
infer_output_typeregister_urnrunner_api_requires_keyed_inputto_runner_apito_runner_api_parameterto_runner_api_pickledtype_check_inputstype_check_inputs_or_outputstype_check_outputsAnnotates the input type of a
PTransformwith a type-hint.Annotates the output type of a
PTransformwith a type-hint.Adds resource hints to the
PTransform.Attributes
SUBSCRIPTIONlabelpipelineside_inputsGenerates the full subscription path from project and subscription id.
- classmethod get_client_factory(mocked=False)[source]#
Returns a factory for
ReadFromPubSubobjects.- Return type:
- expand(pcoll)[source]#
Applies the transform to the pipeline root and returns a PCollection of messages.
- Parameters:
pcoll (PCollection) – An input PCollection. This is expected to be a
PBeginwhen used with a real or mockedReadFromPubSub, since Pub/Sub sources begin from the pipeline root.- Returns:
A PCollection of dictionaries where each dictionary contains: - “data”: the decoded message string (if decoding is enabled), - “attributes”: a dictionary of message attributes (if available).
- Return type:
beam.PCollection