gfw.common.bigquery.BigQueryHelper#

class BigQueryHelper(client_factory=<class 'google.cloud.bigquery.client.Client'>, dry_run=False, **kwargs)[source]#

Wrapper around bigquery.Client with extended functionality.

Parameters:

client_factory (Callable[[...], Client]) – A callable to create bigquery client objects. Defaults to the canonical bigquery.Client factory.
dry_run (bool) – If True, queries jobs will be run in dry run mode. For more information, check bigquery documentation.
**kwargs (Any) – Extra keyword arguments to be passed to the provided client_factory.

Methods

`create_external_table`	Creates a BigQuery external table.
`create_table`	Creates a BigQuery table.
`create_view`	Creates or replaces a BigQuery view.
`end_session`	Terminates session with given `session_id`.
`fetch_schema`	Fetch the schema of a BigQuery table.
`format_jinja2`	Render a Jinja2 template with the given keyword arguments.
`get_client_factory`	Returns a factory for `bigquery.Client` objects.
`load_from_json`	Loads an iterable of json rows into BigQuery table.
`mocked`	Returns a `BigQueryHelper` instance with a mocked client.
`run_query`	Runs a query.

Attributes

client

Returns the instance of bigquery.Client to be used.

classmethod mocked(**kwargs)[source]#

Returns a BigQueryHelper instance with a mocked client.

Return type:: BigQueryHelper

classmethod get_client_factory(mocked=False)[source]#

Returns a factory for bigquery.Client objects.

Return type:: Callable[[…], Client]

property client: Client#: Returns the instance of bigquery.Client to be used.

end_session(session_id)[source]#

Terminates session with given session_id.

create_table(table, description='', schema=None, partition_field=None, partition_type='DAY', clustering_fields=None, labels=None, **kwargs)[source]#

Creates a BigQuery table.

Parameters:

table (str) – Table name like dataset.table.
description (str) – Text to include in the table’s description field.
schema (List[Dict[str, str]] | None) – Schema of the table.
partition_field (str | None) – Name of field to use for time partitioning.
partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.
clustering_fields (list[str] | None) – A list of fields to use for clustering the BigQuery table (optional).
labels (dict[str, str] | None) – Dictionary of labels to audit costs.
**kwargs (Any) – Extra keyword arguments to be passed to the client.create_table() method.

Returns:

The created table.

Return type:

Table

create_external_table(table, source_uris, description='', schema=None, source_format='PARQUET', hive_partition_uri_prefix=None, require_partition_filter=False, replace=False, **kwargs)[source]#

Creates a BigQuery external table.

Parameters:

table (str) – Table name like project.dataset.table.
source_uris (List[str]) – List of GCS URIs, e.g. ['gs://bucket/*.parquet'].
description (str) – Text to include in the table’s description field.
schema (List[SchemaField] | None) – Schema of the table. If not provided, autodetect is enabled.
source_format (str) – The format of the source files. Defaults to PARQUET.
hive_partition_uri_prefix (str | None) – URI prefix for hive partitioning, e.g. 'gs://bucket/'.
require_partition_filter (bool) – If True, queries must include a partition filter. Defaults to False.
replace (bool) – If True, the table will be deleted and recreated if it already exists. Defaults to False.
**kwargs (Any) – Extra keyword arguments passed to client.create_table.

Returns:

The created table.

Return type:

Table

create_view(view_id, view_query)[source]#

Creates or replaces a BigQuery view.

This method is declarative: the provided query becomes the source of truth for the view definition. If the view already exists, it is replaced. If it does not exist, it is created.

Parameters:

view_id (str) – The destination view, e.g. project.dataset.view_id.
view_query (str) – The SELECT query that defines the view.

run_query(query_str, destination=None, write_disposition='WRITE_APPEND', clustering_fields=None, session_id=None, labels=None, **kwargs)[source]#

Runs a query.

Parameters:

query_str (str) – The query to run.
destination (str | None) – The table in which to write the outputs of the query.
write_disposition (str) – The write disposition.
clustering_fields (list[str] | None) – List of field names to use for clustering.
session_id (str | None) – The session_id to use for the query.
labels (dict[str, Any] | None) – Labels to apply.
**kwargs (Any) – Extra keyword arguments to be passed to job.QueryJobConfig constructor.

Returns:

An instance wrapping the BigQuery QueryJob, providing convenient access to the query results and metadata.

Return type:

QueryResult

load_from_json(rows, destination, partition_field=None, partition_type='DAY', **kwargs)[source]#

Loads an iterable of json rows into BigQuery table.

Parameters:

rows (list[dict[str, Any]]) – The iterable of JSON dictionaries containing data to be loaded.
destination (str) – The table in which to write the data.
partition_field (str | None) – The field to use for partitioning the BigQuery table (optional).
partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.
**kwargs (Any) – Extra keyword arguments to be passed to the job.LoadJobConfig constructor.

fetch_schema(table)[source]#

Fetch the schema of a BigQuery table.

Parameters:: table (str) – Fully-qualified table name (project.dataset.table).
Returns:: A Schema wrapping the table’s schema fields.
Return type:: Schema

static format_jinja2(template_path, search_path=PosixPath('.'), **kwargs)[source]#

Render a Jinja2 template with the given keyword arguments.

Parameters:

template_path (Path) – The path to the Jinja2 template.
search_path (list[Path] | Path) – The base directory in which to search for the template path. Can be a list of paths.
**kwargs (Any) – Parameters required to render the query. It may contain extra parameters which are not used by the template, but all required parameters must be provided.

Returns:

The rendered query.

Return type:

str