gfw.common.beam.transforms.ReadFromJson#

class ReadFromJson(input_file, coder=<class 'dict'>, lines=False, create_kwargs=None, **kwargs)[source]#

Beam transform to read a PCollection from a JSON file.

This transform loads a local JSON or JSONLines file eagerly (outside the pipeline), then injects the resulting records into the pipeline using beam.Create.

Useful for testing, prototyping, or controlled ingestion.

Parameters:
  • input_file (str | Path) – Path to the local file to read.

  • coder (Callable) – Callable to apply to each decoded record. Defaults to dict.

  • lines (bool) – If True, interprets the input as newline-delimited JSON (JSONLines).

  • create_kwargs (dict | None) – Optional dictionary of keyword arguments to pass to beam.Create. Use this to control serialization, type hints, etc.

  • **kwargs (Any) – Additional keyword arguments passed to base PTransform class.

Raises:

ValueError – If the input file does not exist at pipeline construction time.

Example

with beam.Pipeline() as p:
    pcoll = p | ReadFromJson("data/input.json", lines=True)
    pcoll | beam.Map(print)

Methods

annotations

default_label

default_type_hints

display_data

Returns the display data associated to a pipeline component.

expand

Apply transform to pipeline p: create PCollection from loaded JSON data.

from_runner_api

get_resource_hints

get_type_hints

Gets and/or initializes type hints for this object.

get_windowing

Returns the window function to be associated with transform's output.

infer_output_type

register_urn

runner_api_requires_keyed_input

to_runner_api

to_runner_api_parameter

to_runner_api_pickled

type_check_inputs

type_check_inputs_or_outputs

type_check_outputs

with_input_types

Annotates the input type of a PTransform with a type-hint.

with_output_types

Annotates the output type of a PTransform with a type-hint.

with_resource_hints

Adds resource hints to the PTransform.

Attributes

label

pipeline

side_inputs

expand(p)[source]#

Apply transform to pipeline p: create PCollection from loaded JSON data.

Return type:

PCollection