gfw.common.beam.transforms.ReadFromBigQuery#

class ReadFromBigQuery(query, output_type=<class 'dict'>, method='EXPORT', use_standard_sql=True, read_from_bigquery_factory=<class 'apache_beam.io.gcp.bigquery.ReadFromBigQuery'>, read_from_bigquery_kwargs=None, **kwargs)[source]#

Wrapper around ReadFromBigQuery with optional casting.

Parameters:
  • query (str) – The query to execute.

  • output_type (type) – The Beam type hint for the output (e.g., a NamedTuple). If not provided, defaults to dict.

  • method (str) – The method to use to read from BigQuery. It may be EXPORT or DIRECT_READ.

  • use_standard_sql (bool) – Specifies whether to use BigQuery’s standard SQL dialect for this query. Defaults to True.

  • read_from_bigquery_factory (Callable[..., io.ReadFromBigQuery]) – A factory function used to create a ReadFromBigQuery instance. This is primarily useful for testing, where you may want to inject a custom or fake implementation instead of using the real transform. If not provided, the default class will be used.

  • write_to_bigquery_kwargs – Any additional keyword arguments to be passed to ReadFromBigQuery class. Check official Apache Beam documentation.

  • **kwargs (Any) – Additional keyword arguments passed to base PTransform class.

Methods

annotations

default_label

default_type_hints

display_data

Returns the display data associated to a pipeline component.

expand

Applies PCollection to read from BigQuery.

from_query

Creates a ReadFromBigQuery PTransform from a Query object.

from_runner_api

get_client_factory

Returns a factory for ReadFromPubSub objects.

get_resource_hints

get_type_hints

Gets and/or initializes type hints for this object.

get_windowing

Returns the window function to be associated with transform's output.

infer_output_type

register_urn

runner_api_requires_keyed_input

to_runner_api

to_runner_api_parameter

to_runner_api_pickled

type_check_inputs

type_check_inputs_or_outputs

type_check_outputs

with_input_types

Annotates the input type of a PTransform with a type-hint.

with_output_types

Annotates the output type of a PTransform with a type-hint.

with_resource_hints

Adds resource hints to the PTransform.

Attributes

label

pipeline

side_inputs

classmethod get_client_factory(mocked=False)[source]#

Returns a factory for ReadFromPubSub objects.

Return type:

Callable

classmethod from_query(query, use_type=False, **kwargs)[source]#

Creates a ReadFromBigQuery PTransform from a Query object.

Parameters:
  • query (Query) – An instance of a Query subclass. Its render method is used to produce the SQL query string.

  • use_type (bool) – If True, sets PTransform type to the provided output_type.

  • **kwargs (Any) – Any additional arguments for ReadFromBigQuery constructor.

Returns:

A configured ReadFromBigQuery instance.

Return type:

ReadFromBigQuery

expand(pcoll)[source]#

Applies PCollection to read from BigQuery.

Return type:

PCollection[Any]