gfw.common.bigquery.BigQueryHelper#
- class BigQueryHelper(client_factory=<class 'google.cloud.bigquery.client.Client'>, dry_run=False, **kwargs)[source]#
Wrapper around
bigquery.Clientwith extended functionality.- Parameters:
client_factory (Callable[[...], Client]) – A callable to create bigquery client objects. Defaults to the canonical
bigquery.Clientfactory.dry_run (bool) – If True, queries jobs will be run in dry run mode. For more information, check bigquery documentation.
**kwargs (Any) – Extra keyword arguments to be passed to the provided
client_factory.
Methods
Creates a BigQuery external table.
Creates a BigQuery table.
Creates or replaces a BigQuery view.
Terminates session with given
session_id.Render a Jinja2 template with the given keyword arguments.
Returns a factory for
bigquery.Clientobjects.Loads an iterable of json rows into BigQuery table.
Returns a
BigQueryHelperinstance with a mocked client.Runs a query.
Attributes
Returns the instance of
bigquery.Clientto be used.- classmethod mocked(**kwargs)[source]#
Returns a
BigQueryHelperinstance with a mocked client.- Return type:
- classmethod get_client_factory(mocked=False)[source]#
Returns a factory for
bigquery.Clientobjects.- Return type:
Callable[[…], Client]
- property client: Client#
Returns the instance of
bigquery.Clientto be used.
- create_table(table, description='', schema=None, partition_field=None, partition_type='DAY', clustering_fields=None, labels=None, **kwargs)[source]#
Creates a BigQuery table.
- Parameters:
table (str) – Table name like
dataset.table.description (str) – Text to include in the table’s description field.
partition_field (str | None) – Name of field to use for time partitioning.
partition_type (str) – The type of partitioning to use (e.g.,
DAY,HOUR). Defaults toDAY.clustering_fields (list[str] | None) – A list of fields to use for clustering the BigQuery table (optional).
labels (dict[str, str] | None) – Dictionary of labels to audit costs.
**kwargs (Any) – Extra keyword arguments to be passed to the
client.create_table()method.
- Returns:
The created table.
- Return type:
Table
- create_external_table(table, source_uris, description='', schema=None, source_format='PARQUET', hive_partition_uri_prefix=None, require_partition_filter=False, replace=False, **kwargs)[source]#
Creates a BigQuery external table.
- Parameters:
table (str) – Table name like
project.dataset.table.source_uris (List[str]) – List of GCS URIs, e.g.
['gs://bucket/*.parquet'].description (str) – Text to include in the table’s description field.
schema (List[SchemaField] | None) – Schema of the table. If not provided, autodetect is enabled.
source_format (str) – The format of the source files. Defaults to PARQUET.
hive_partition_uri_prefix (str | None) – URI prefix for hive partitioning, e.g.
'gs://bucket/'.require_partition_filter (bool) – If True, queries must include a partition filter. Defaults to False.
replace (bool) – If True, the table will be deleted and recreated if it already exists. Defaults to False.
**kwargs (Any) – Extra keyword arguments passed to
client.create_table.
- Returns:
The created table.
- Return type:
Table
- create_view(view_id, view_query)[source]#
Creates or replaces a BigQuery view.
This method is declarative: the provided query becomes the source of truth for the view definition. If the view already exists, it is replaced. If it does not exist, it is created.
- run_query(query_str, destination=None, write_disposition='WRITE_APPEND', clustering_fields=None, session_id=None, labels=None, **kwargs)[source]#
Runs a query.
- Parameters:
query_str (str) – The query to run.
destination (str | None) – The table in which to write the outputs of the query.
write_disposition (str) – The write disposition.
clustering_fields (list[str] | None) – List of field names to use for clustering.
session_id (str | None) – The session_id to use for the query.
**kwargs (Any) – Extra keyword arguments to be passed to
job.QueryJobConfigconstructor.
- Returns:
An instance wrapping the BigQuery QueryJob, providing convenient access to the query results and metadata.
- Return type:
- load_from_json(rows, destination, partition_field=None, partition_type='DAY', **kwargs)[source]#
Loads an iterable of json rows into BigQuery table.
- Parameters:
rows (list[dict[str, Any]]) – The iterable of JSON dictionaries containing data to be loaded.
destination (str) – The table in which to write the data.
partition_field (str | None) – The field to use for partitioning the BigQuery table (optional).
partition_type (str) – The type of partitioning to use (e.g.,
DAY,HOUR). Defaults toDAY.**kwargs (Any) – Extra keyword arguments to be passed to the
job.LoadJobConfigconstructor.
- static format_jinja2(template_path, search_path=PosixPath('.'), **kwargs)[source]#
Render a Jinja2 template with the given keyword arguments.
- Parameters:
template_path (Path) – The path to the Jinja2 template.
search_path (list[Path] | Path) – The base directory in which to search for the template path. Can be a list of paths.
**kwargs (Any) – Parameters required to render the query. It may contain extra parameters which are not used by the template, but all required parameters must be provided.
- Returns:
The rendered query.
- Return type: