gfw.common.bigquery#
BigQuery utilities and configuration classes.
Classes#
Wrapper around |
|
Wrapper around |
|
Abstract base class for BigQuery table configuration. |
|
Generates a structured description for BigQuery table metadata. |
- class BigQueryHelper(client_factory=<class 'google.cloud.bigquery.client.Client'>, dry_run=False, **kwargs)[source]#
Wrapper around
bigquery.Clientwith extended functionality.- Parameters:
client_factory (Callable[[...], Client]) – A callable to create bigquery client objects. Defaults to the canonical
bigquery.Clientfactory.dry_run (bool) – If True, queries jobs will be run in dry run mode. For more information, check bigquery documentation.
**kwargs (Any) – Extra keyword arguments to be passed to the provided
client_factory.
- property client: Client#
Returns the instance of
bigquery.Clientto be used.
- create_external_table(table, source_uris, description='', schema=None, source_format='PARQUET', hive_partition_uri_prefix=None, require_partition_filter=False, replace=False, **kwargs)[source]#
Creates a BigQuery external table.
- Parameters:
table (str) – Table name like
project.dataset.table.source_uris (List[str]) – List of GCS URIs, e.g.
['gs://bucket/*.parquet'].description (str) – Text to include in the table’s description field.
schema (List[SchemaField] | None) – Schema of the table. If not provided, autodetect is enabled.
source_format (str) – The format of the source files. Defaults to PARQUET.
hive_partition_uri_prefix (str | None) – URI prefix for hive partitioning, e.g.
'gs://bucket/'.require_partition_filter (bool) – If True, queries must include a partition filter. Defaults to False.
replace (bool) – If True, the table will be deleted and recreated if it already exists. Defaults to False.
**kwargs (Any) – Extra keyword arguments passed to
client.create_table.
- Returns:
The created table.
- Return type:
Table
- create_table(table, description='', schema=None, partition_field=None, partition_type='DAY', clustering_fields=None, labels=None, **kwargs)[source]#
Creates a BigQuery table.
- Parameters:
table (str) – Table name like
dataset.table.description (str) – Text to include in the table’s description field.
partition_field (str | None) – Name of field to use for time partitioning.
partition_type (str) – The type of partitioning to use (e.g.,
DAY,HOUR). Defaults toDAY.clustering_fields (list[str] | None) – A list of fields to use for clustering the BigQuery table (optional).
labels (dict[str, str] | None) – Dictionary of labels to audit costs.
**kwargs (Any) – Extra keyword arguments to be passed to the
client.create_table()method.
- Returns:
The created table.
- Return type:
Table
- create_view(view_id, view_query)[source]#
Creates or replaces a BigQuery view.
This method is declarative: the provided query becomes the source of truth for the view definition. If the view already exists, it is replaced. If it does not exist, it is created.
- static format_jinja2(template_path, search_path=PosixPath('.'), **kwargs)[source]#
Render a Jinja2 template with the given keyword arguments.
- Parameters:
template_path (Path) – The path to the Jinja2 template.
search_path (list[Path] | Path) – The base directory in which to search for the template path. Can be a list of paths.
**kwargs (Any) – Parameters required to render the query. It may contain extra parameters which are not used by the template, but all required parameters must be provided.
- Returns:
The rendered query.
- Return type:
- classmethod get_client_factory(mocked=False)[source]#
Returns a factory for
bigquery.Clientobjects.- Return type:
Callable[[…], Client]
- load_from_json(rows, destination, partition_field=None, partition_type='DAY', **kwargs)[source]#
Loads an iterable of json rows into BigQuery table.
- Parameters:
rows (list[dict[str, Any]]) – The iterable of JSON dictionaries containing data to be loaded.
destination (str) – The table in which to write the data.
partition_field (str | None) – The field to use for partitioning the BigQuery table (optional).
partition_type (str) – The type of partitioning to use (e.g.,
DAY,HOUR). Defaults toDAY.**kwargs (Any) – Extra keyword arguments to be passed to the
job.LoadJobConfigconstructor.
- classmethod mocked(**kwargs)[source]#
Returns a
BigQueryHelperinstance with a mocked client.- Return type:
- run_query(query_str, destination=None, write_disposition='WRITE_APPEND', clustering_fields=None, session_id=None, labels=None, **kwargs)[source]#
Runs a query.
- Parameters:
query_str (str) – The query to run.
destination (str | None) – The table in which to write the outputs of the query.
write_disposition (str) – The write disposition.
clustering_fields (list[str] | None) – List of field names to use for clustering.
session_id (str | None) – The session_id to use for the query.
**kwargs (Any) – Extra keyword arguments to be passed to
job.QueryJobConfigconstructor.
- Returns:
An instance wrapping the BigQuery QueryJob, providing convenient access to the query results and metadata.
- Return type:
- class QueryResult(query_job, row_iterator)[source]#
Wrapper around
bigquery.job.QueryJobwith access to results.This class encapsulates
query_jobandrow_iteratorinstances, exposing rows via iteration and providing convenience methods likeiter_as_dicts()andtolist().- Parameters:
query_job (QueryJob) – The original
QueryJob, which can be used to access job metadata such as session IDs, job statistics, and more.row_iterator (RowIterator) – The
RowIteratorreturned by the query job.
Example
result = bq_client.run_query("SELECT * FROM my_table") # Iterate raw rows for row in result: print(row) # Iterate as dicts for row in result.iter_as_dicts(): print(row) # Materialize rows = result.tolist() rows_as_dicts = result.tolist(as_dicts=True) # Access job metadata print(result.query_job.job_id) print(result.session_id)
- query_job: QueryJob#
The encapsulated
QueryJobinstance.
- row_iterator: RowIterator#
The
RowIteratorreturned by the query job.
- class TableConfig(table_id, schema_file, description=None, partition_type='DAY', partition_field=None, clustering_fields=None, view_suffix='view')[source]#
Abstract base class for BigQuery table configuration.
- delete_query(start_date, end_date=None)[source]#
Returns the query to perform when deleting records from this table.
- Return type:
- description: TableDescription | None = None#
Optional
TableDescriptioninstance for the table metadata.
- to_bigquery_params(include_description=True)[source]#
Returns parameters for BigQuery table creation or write operations.
This dictionary is intended to be unpacked as keyword arguments into
BigQueryHelper.create_table.
- class TableDescription(repo_name, version='', title='', subtitle='', summary='To be completed.', caveats='To be completed.', relevant_params=<factory>)[source]#
Generates a structured description for BigQuery table metadata.
- render()[source]#
Renders the description for use in BigQuery table metadata.
- Returns:
A formatted string including summary, caveats, and relevant parameters.
- Return type:
- relevant_params: dict[str, Any]#
Key parameters relevant to the table’s content generation.
The keys are parameter names (strings), and the values can be any type convertible to string.
When rendered, the parameters are shown as a bullet list of key-value pairs, for example:
param1: value1
long_param2: value2
x: 42