gfw.common.beam.pipeline.LinearDag#

class LinearDag(sources=(), core=None, side_inputs=None, sinks=())[source]#

A linear DAG implementation for Apache Beam pipelines.

This DAG:
  1. Applies multiple sources PTransforms and merges outputs into a single PCollection.

  2. Applies a single core PTransform, with an optional side inputs PTransform.

  3. Applies multiple sinks PTransforms.

Parameters:
  • sources (Tuple[PTransform[Any, Any], ...]) – A list of PTransforms that read input data.

  • core (PTransform[Any, Any] | None) – The core PTransform that processes the data.

  • side_inputs (PTransform[Any, Any] | None) – A PTransform used to read side inputs that will be injected into the core transform.

  • sinks (Tuple[PTransform[Any, Any], ...]) – A list of PTransforms that write the output data.

output_paths#

A list of output paths for each sink, if they contain the path attribute.

Methods

apply

Applies the linear DAG implementation to an Apache Beam pipeline.

Attributes

output_paths

Resolves and returns a list of output paths for each sink in the pipeline.

apply(p)[source]#

Applies the linear DAG implementation to an Apache Beam pipeline.

Return type:

PCollection

property output_paths: List[Path | str]#

Resolves and returns a list of output paths for each sink in the pipeline.