gfw.common.beam.transforms.ReadFromJson#
- class ReadFromJson(input_file, coder=<class 'dict'>, lines=False, create_kwargs=None, **kwargs)[source]#
Beam transform to read a PCollection from a JSON file.
This transform loads a local JSON or JSONLines file eagerly (outside the pipeline), then injects the resulting records into the pipeline using
beam.Create.Useful for testing, prototyping, or controlled ingestion.
- Parameters:
coder (Callable) – Callable to apply to each decoded record. Defaults to
dict.lines (bool) – If True, interprets the input as newline-delimited JSON (JSONLines).
create_kwargs (dict | None) – Optional dictionary of keyword arguments to pass to
beam.Create. Use this to control serialization, type hints, etc.**kwargs (Any) – Additional keyword arguments passed to base PTransform class.
- Raises:
ValueError – If the input file does not exist at pipeline construction time.
Example
with beam.Pipeline() as p: pcoll = p | ReadFromJson("data/input.json", lines=True) pcoll | beam.Map(print)
Methods
annotationsdefault_labeldefault_type_hintsReturns the display data associated to a pipeline component.
Apply transform to pipeline
p: create PCollection from loaded JSON data.from_runner_apiget_resource_hintsGets and/or initializes type hints for this object.
Returns the window function to be associated with transform's output.
infer_output_typeregister_urnrunner_api_requires_keyed_inputto_runner_apito_runner_api_parameterto_runner_api_pickledtype_check_inputstype_check_inputs_or_outputstype_check_outputsAnnotates the input type of a
PTransformwith a type-hint.Annotates the output type of a
PTransformwith a type-hint.Adds resource hints to the
PTransform.Attributes
labelpipelineside_inputs