gfw.common.bigquery#

BigQuery utilities and configuration classes.

Classes#

BigQueryHelper

Wrapper around bigquery.Client with extended functionality.

QueryResult

Wrapper around bigquery.job.QueryJob with access to results.

TableConfig

Abstract base class for BigQuery table configuration.

TableDescription

Generates a structured description for BigQuery table metadata.

class BigQueryHelper(client_factory=<class 'google.cloud.bigquery.client.Client'>, dry_run=False, **kwargs)[source]#

Wrapper around bigquery.Client with extended functionality.

Parameters:
  • client_factory (Callable[[...], Client]) – A callable to create bigquery client objects. Defaults to the canonical bigquery.Client factory.

  • dry_run (bool) – If True, queries jobs will be run in dry run mode. For more information, check bigquery documentation.

  • **kwargs (Any) – Extra keyword arguments to be passed to the provided client_factory.

property client: Client#

Returns the instance of bigquery.Client to be used.

create_external_table(table, source_uris, description='', schema=None, source_format='PARQUET', hive_partition_uri_prefix=None, require_partition_filter=False, replace=False, **kwargs)[source]#

Creates a BigQuery external table.

Parameters:
  • table (str) – Table name like project.dataset.table.

  • source_uris (List[str]) – List of GCS URIs, e.g. ['gs://bucket/*.parquet'].

  • description (str) – Text to include in the table’s description field.

  • schema (List[SchemaField] | None) – Schema of the table. If not provided, autodetect is enabled.

  • source_format (str) – The format of the source files. Defaults to PARQUET.

  • hive_partition_uri_prefix (str | None) – URI prefix for hive partitioning, e.g. 'gs://bucket/'.

  • require_partition_filter (bool) – If True, queries must include a partition filter. Defaults to False.

  • replace (bool) – If True, the table will be deleted and recreated if it already exists. Defaults to False.

  • **kwargs (Any) – Extra keyword arguments passed to client.create_table.

Returns:

The created table.

Return type:

Table

create_table(table, description='', schema=None, partition_field=None, partition_type='DAY', clustering_fields=None, labels=None, **kwargs)[source]#

Creates a BigQuery table.

Parameters:
  • table (str) – Table name like dataset.table.

  • description (str) – Text to include in the table’s description field.

  • schema (List[Dict[str, str]] | None) – Schema of the table.

  • partition_field (str | None) – Name of field to use for time partitioning.

  • partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.

  • clustering_fields (list[str] | None) – A list of fields to use for clustering the BigQuery table (optional).

  • labels (dict[str, str] | None) – Dictionary of labels to audit costs.

  • **kwargs (Any) – Extra keyword arguments to be passed to the client.create_table() method.

Returns:

The created table.

Return type:

Table

create_view(view_id, view_query)[source]#

Creates or replaces a BigQuery view.

This method is declarative: the provided query becomes the source of truth for the view definition. If the view already exists, it is replaced. If it does not exist, it is created.

Parameters:
  • view_id (str) – The destination view, e.g. project.dataset.view_id.

  • view_query (str) – The SELECT query that defines the view.

end_session(session_id)[source]#

Terminates session with given session_id.

static format_jinja2(template_path, search_path=PosixPath('.'), **kwargs)[source]#

Render a Jinja2 template with the given keyword arguments.

Parameters:
  • template_path (Path) – The path to the Jinja2 template.

  • search_path (list[Path] | Path) – The base directory in which to search for the template path. Can be a list of paths.

  • **kwargs (Any) – Parameters required to render the query. It may contain extra parameters which are not used by the template, but all required parameters must be provided.

Returns:

The rendered query.

Return type:

str

classmethod get_client_factory(mocked=False)[source]#

Returns a factory for bigquery.Client objects.

Return type:

Callable[[…], Client]

load_from_json(rows, destination, partition_field=None, partition_type='DAY', **kwargs)[source]#

Loads an iterable of json rows into BigQuery table.

Parameters:
  • rows (list[dict[str, Any]]) – The iterable of JSON dictionaries containing data to be loaded.

  • destination (str) – The table in which to write the data.

  • partition_field (str | None) – The field to use for partitioning the BigQuery table (optional).

  • partition_type (str) – The type of partitioning to use (e.g., DAY, HOUR). Defaults to DAY.

  • **kwargs (Any) – Extra keyword arguments to be passed to the job.LoadJobConfig constructor.

classmethod mocked(**kwargs)[source]#

Returns a BigQueryHelper instance with a mocked client.

Return type:

BigQueryHelper

run_query(query_str, destination=None, write_disposition='WRITE_APPEND', clustering_fields=None, session_id=None, labels=None, **kwargs)[source]#

Runs a query.

Parameters:
  • query_str (str) – The query to run.

  • destination (str | None) – The table in which to write the outputs of the query.

  • write_disposition (str) – The write disposition.

  • clustering_fields (list[str] | None) – List of field names to use for clustering.

  • session_id (str | None) – The session_id to use for the query.

  • labels (dict[str, Any] | None) – Labels to apply.

  • **kwargs (Any) – Extra keyword arguments to be passed to job.QueryJobConfig constructor.

Returns:

An instance wrapping the BigQuery QueryJob, providing convenient access to the query results and metadata.

Return type:

QueryResult

class QueryResult(query_job, row_iterator)[source]#

Wrapper around bigquery.job.QueryJob with access to results.

This class encapsulates query_job and row_iterator instances, exposing rows via iteration and providing convenience methods like iter_as_dicts() and tolist().

Parameters:
  • query_job (QueryJob) – The original QueryJob, which can be used to access job metadata such as session IDs, job statistics, and more.

  • row_iterator (RowIterator) – The RowIterator returned by the query job.

Example

result = bq_client.run_query("SELECT * FROM my_table")

# Iterate raw rows
for row in result:
    print(row)

# Iterate as dicts
for row in result.iter_as_dicts():
    print(row)

# Materialize
rows = result.tolist()
rows_as_dicts = result.tolist(as_dicts=True)

# Access job metadata
print(result.query_job.job_id)
print(result.session_id)
iter_as_dicts()[source]#

Iterates over rows as dictionaries.

Return type:

Iterator[Dict[str, Any]]

property session_id: str | None#

Returns the session_id of the job, or None if not available.

tolist(as_dicts=False)[source]#

Materializes all rows into a list.

Parameters:

as_dicts (bool) – If True, rows are converted to dictionaries. Defaults to False.

Returns:

A list of Row objects, or dictionaries if as_dicts=True.

Return type:

List[Row | Dict[str, Any]]

query_job: QueryJob#

The encapsulated QueryJob instance.

row_iterator: RowIterator#

The RowIterator returned by the query job.

class TableConfig(table_id, schema_file, description=None, partition_type='DAY', partition_field=None, clustering_fields=None, view_suffix='view')[source]#

Abstract base class for BigQuery table configuration.

clustering_fields: Tuple[str, ...] | None = None#

Optional tuple of fields for clustering.

delete_query(start_date, end_date=None)[source]#

Returns the query to perform when deleting records from this table.

Return type:

str

description: TableDescription | None = None#

Optional TableDescription instance for the table metadata.

partition_field: str | None = None#

Field used for partitioning (optional).

partition_type: str = 'DAY'#

Type of partitioning to apply (e.g., DAY, MONTH).

abstract property schema: list[dict[str, str]]#

Returns the schema of the table.

to_bigquery_params(include_description=True)[source]#

Returns parameters for BigQuery table creation or write operations.

This dictionary is intended to be unpacked as keyword arguments into BigQueryHelper.create_table.

Parameters:

include_description (bool) – Whether to include the formatted description string.

Returns:

A dictionary of parameters suitable for BigQuery operations.

Return type:

dict[str, Any]

property view_id: str#

Returns the ID of the view for the table.

view_query()[source]#

Returns the query to perform to create a view for this table.

Return type:

str

view_suffix: str | None = 'view'#

Suffix to use when constructing the view ID.

table_id: str#

Fully qualified BigQuery table ID.

schema_file: str#

Path to the file defining the schema.

class TableDescription(repo_name, version='', title='', subtitle='', summary='To be completed.', caveats='To be completed.', relevant_params=<factory>)[source]#

Generates a structured description for BigQuery table metadata.

caveats: str = 'To be completed.'#

Known limitations or notes about the data.

render()[source]#

Renders the description for use in BigQuery table metadata.

Returns:

A formatted string including summary, caveats, and relevant parameters.

Return type:

str

subtitle: str = ''#

Subtitle or one-line summary.

summary: str = 'To be completed.'#

High-level summary of the table’s purpose.

title: str = ''#

Title of the table or dataset.

version: str = ''#

Version of the project generating this table.

repo_name: str#

GitHub repository name (used for URLs and headers).

relevant_params: dict[str, Any]#

Key parameters relevant to the table’s content generation.

The keys are parameter names (strings), and the values can be any type convertible to string.

When rendered, the parameters are shown as a bullet list of key-value pairs, for example:

  • param1: value1

  • long_param2: value2

  • x: 42