Table Partition¶
One of BigQuery's key features is its ability to partition a table into small chunks of data. In many cases, table partitions can improve query performance and reduce costs.
Consider a table of sales
To fetch sales on 2022-07-27, BigQuery has to scan the entire sale_date
column. However, if we set up the table with a
daily partition on sale_date
, the records will be partitioned into chunks like this
Now, BigQuery doesn't need to can the entire table to identify sales on a particular date.
Notes¶
- A table can have zero or one partition.
- Partitions can be set on sequential data types (Dates, Timestamps, or Integers).
- You can make a partition with a
PARTITION BY
clause. - A partition table can have up to 4,000 partitions.
Table Cluster¶
Another option to improve performance on a BigQuery table is to assign it a cluster. A cluster improves performance by pre-sorting the data inside a table. For example, clustering the sales
table by sale_id
might order the data like this.
Now, fetching rows WHERE sale_id IN (1,4)
will be fast and efficient.
Notes
- Tables can have a partition and a cluster (even on the same field).
- A cluster can be set on a collection of multiple columns.
- A cluster can be defined with a
CLUSTER BY
clause containing aclustering_column_list
.