gfw.common.bigquery.BigQueryHelper#

class BigQueryHelper(client_factory=<class 'google.cloud.bigquery.client.Client'>, dry_run=False, **kwargs)[source]#

Wrapper around bigquery.Client with extended functionality.

Parameters:
  • client_factory (Callable[[...], Client]) – A callable to create bigquery client objects. Defaults to the canonical bigquery.Client factory.

  • dry_run (bool) – If True, queries jobs will be run in dry run mode. For more information, check bigquery documentation.

  • **kwargs (Any) – Extra keyword arguments to be passed to the provided client_factory.

Methods

create_external_table

Creates a BigQuery external table.

create_table

Creates a BigQuery table.

create_view

Creates or replaces a BigQuery view.

end_session

Terminates session with given session_id.

format_jinja2

Render a Jinja2 template with the given keyword arguments.

get_client_factory

Returns a factory for bigquery.Client objects.

load_from_json

Loads an iterable of json rows into BigQuery table.

mocked

Returns a BigQueryHelper instance with a mocked client.

run_query

Runs a query.

Attributes

client

Returns the instance of bigquery.Client to be used.

classmethod mocked(**kwargs)[source]#

Returns a BigQueryHelper instance with a mocked client.

Return type:

BigQueryHelper

classmethod get_client_factory(mocked=False)[source]#

Returns a factory for bigquery.Client objects.

Return type:

Callable[[…], Client]

property client: Client#

Returns the instance of bigquery.Client to be used.

end_session(session_id)[source]#

Terminates session with given session_id.

create_table(table, description='', schema=None, partition_field=None, partition_type='DAY', clustering_fields=None, labels=None, **kwargs)[source]#

Creates a BigQuery table.

Parameters:
  • table (str) – Table name like dataset.table.

  • description (str) – Text to include in the table’s description field.

  • schema (List[Dict[str, str]] | None) – Schema of the table.

  • partition_field (str | None) – Name of field to use for time partitioning.

  • partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.

  • clustering_fields (list[str] | None) – A list of fields to use for clustering the BigQuery table (optional).

  • labels (dict[str, str] | None) – Dictionary of labels to audit costs.

  • **kwargs (Any) – Extra keyword arguments to be passed to the client.create_table() method.

Returns:

The created table.

Return type:

Table

create_external_table(table, source_uris, description='', schema=None, source_format='PARQUET', hive_partition_uri_prefix=None, require_partition_filter=False, replace=False, **kwargs)[source]#

Creates a BigQuery external table.

Parameters:
  • table (str) – Table name like project.dataset.table.

  • source_uris (List[str]) – List of GCS URIs, e.g. ['gs://bucket/*.parquet'].

  • description (str) – Text to include in the table’s description field.

  • schema (List[SchemaField] | None) – Schema of the table. If not provided, autodetect is enabled.

  • source_format (str) – The format of the source files. Defaults to PARQUET.

  • hive_partition_uri_prefix (str | None) – URI prefix for hive partitioning, e.g. 'gs://bucket/'.

  • require_partition_filter (bool) – If True, queries must include a partition filter. Defaults to False.

  • replace (bool) – If True, the table will be deleted and recreated if it already exists. Defaults to False.

  • **kwargs (Any) – Extra keyword arguments passed to client.create_table.

Returns:

The created table.

Return type:

Table

create_view(view_id, view_query)[source]#

Creates or replaces a BigQuery view.

This method is declarative: the provided query becomes the source of truth for the view definition. If the view already exists, it is replaced. If it does not exist, it is created.

Parameters:
  • view_id (str) – The destination view, e.g. project.dataset.view_id.

  • view_query (str) – The SELECT query that defines the view.

run_query(query_str, destination=None, write_disposition='WRITE_APPEND', clustering_fields=None, session_id=None, labels=None, **kwargs)[source]#

Runs a query.

Parameters:
  • query_str (str) – The query to run.

  • destination (str | None) – The table in which to write the outputs of the query.

  • write_disposition (str) – The write disposition.

  • clustering_fields (list[str] | None) – List of field names to use for clustering.

  • session_id (str | None) – The session_id to use for the query.

  • labels (dict[str, Any] | None) – Labels to apply.

  • **kwargs (Any) – Extra keyword arguments to be passed to job.QueryJobConfig constructor.

Returns:

An instance wrapping the BigQuery QueryJob, providing convenient access to the query results and metadata.

Return type:

QueryResult

load_from_json(rows, destination, partition_field=None, partition_type='DAY', **kwargs)[source]#

Loads an iterable of json rows into BigQuery table.

Parameters:
  • rows (list[dict[str, Any]]) – The iterable of JSON dictionaries containing data to be loaded.

  • destination (str) – The table in which to write the data.

  • partition_field (str | None) – The field to use for partitioning the BigQuery table (optional).

  • partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.

  • **kwargs (Any) – Extra keyword arguments to be passed to the job.LoadJobConfig constructor.

static format_jinja2(template_path, search_path=PosixPath('.'), **kwargs)[source]#

Render a Jinja2 template with the given keyword arguments.

Parameters:
  • template_path (Path) – The path to the Jinja2 template.

  • search_path (list[Path] | Path) – The base directory in which to search for the template path. Can be a list of paths.

  • **kwargs (Any) – Parameters required to render the query. It may contain extra parameters which are not used by the template, but all required parameters must be provided.

Returns:

The rendered query.

Return type:

str