gfw.common.beam.pipeline.Pipeline#
- class Pipeline(name='', version='0.1.0', dag=None, pre_hooks=(), post_hooks=(), unparsed_args=(), **options)[source]#
Wrapper around
beam.Pipelinewith extended functionality.- Features:
Merges unparsed, parsed, and default options.
Supports custom DAG definitions.
Enables Google Cloud Profiler integration.
Automatically adds
./setup.pywhensdk_container_imageis not specified.
You can implement your own Dag object to be injected in the constructor, reuse the provided
LinearDag, or just override theapply_dag()method of this class.- Parameters:
name (str) – The name of the pipeline. Defaults to an empty string.
version (str) – The version of the pipeline. Defaults to
0.1.0.dag (Dag | None) – The DAG to be applied to the pipeline. Defaults to an empty LinearDag.
pre_hooks (Sequence[Callable[[...], None]]) – Sequence of callables executed before pipeline run. Each callable receives the pipeline instance as its only argument.
post_hooks (Sequence[Callable[[...], None]]) – Sequence of callables executed after pipeline run completes successfully. Each callable receives the pipeline instance as its only argument.
unparsed_args (Tuple[str, ...]) – A list of unparsed arguments to pass to Beam options. Defaults to an empty tuple.
**options (Any) – Additional options to pass to the Beam pipeline.
- parsed_args#
The parsed arguments from the
unparsed_argslist.
- pipeline_options#
The merged pipeline options, including parsed args, user options, and defaults.
- pipeline#
The initialized beam Pipeline object.
Methods
Applies the provided DAG implementation to the self.pipeline.
Returns the default options for the pipeline.
Executes the Apache Beam pipeline.
Attributes
Returns the
GoogleCloudOptionsview of thePipelineOptions.Returns whether the pipeline is running in streaming mode.
Parses the unparsed arguments using Beam's
PipelineOptions.Returns the initialized
beam.Pipelineobject.Resolves pipeline options.
- property parsed_args: dict[str, Any]#
Parses the unparsed arguments using Beam’s
PipelineOptions.- Returns:
A dictionary of parsed arguments.
- property pipeline_options: PipelineOptions#
Resolves pipeline options.
Combines parsed arguments by beam CLI, constructor parameters and defaults into a single
PipelineOptionsobject.- Returns:
The merged pipeline options.
- property cloud_options: GoogleCloudOptions#
Returns the
GoogleCloudOptionsview of thePipelineOptions.
- property pipeline: Pipeline#
Returns the initialized
beam.Pipelineobject.
- apply_dag()[source]#
Applies the provided DAG implementation to the self.pipeline.
- Return type:
PCollection
- run(wait_until_finish=None)[source]#
Executes the Apache Beam pipeline.
Runs the configured DAG and returns the pipeline result along with its main output(s).
The execution can be either blocking or non-blocking depending on the pipeline type and the
wait_until_finishparameter:If
wait_until_finishisNone(default): - batch pipelines block until completion - streaming pipelines return immediately after submissionIf
wait_until_finishis explicitly set: -Trueblocks until the pipeline finishes -Falsereturns immediately
Note that waiting on a streaming pipeline will block indefinitely unless the job is externally cancelled.
Post-hooks are only executed when the pipeline finishes successfully (i.e., in blocking mode and when the final state is
DONE).- Parameters:
wait_until_finish (bool | None) – Whether to wait for the pipeline execution to complete. If
None, the behavior is inferred from whether the pipeline is running in streaming mode.- Returns:
The
PipelineResultof the executed pipeline.The main output(s) produced by the DAG, which may be a
PCollection, a tuple, a dict, orNone.
- Return type:
A tuple containing