BigQuery - Export query results to local file/Google storage

26,092

Solution 1

Direct export from BigQuery Standard SQL was added recently: Exporting data to csv format

EXPORT DATA OPTIONS(
  uri='gs://mybucket/myfolder2/*.csv',
  format='CSV',
  overwrite=true,
  header=true,
  field_delimiter=';') AS
SELECT 1 as field1, 2 as field2 

Solution 2

As Mikhail Berlyant said,

BigQuery does not provide ability to directly export/download query result to GCS or Local File.

You can still export it using the Web UI in just three steps

  1. Configure query to save the results in a BigQuery table and run it.
  2. Export the table to a bucket in GCS.
  3. Download from the bucket.

Step 1

When in BigQuery screen, before running the query go to More > Query Settings

Configure Query

This opens the following

Query Settings

Here you want to have

  • Destination: Set a destination table for query results
  • Project name: select the project.
  • Dataset name: select a dataset. If you don't have one, create it and come back.
  • Table name: give whatever name you want (must contain only letters, numbers, or underscores).
  • Result size: Allow large results (no size limit).

Then Save it and the Query is configured to be saved in a specific table. Now you can run the Query.

Step 2

To export it to GCP you have to go to the table and click EXPORT > Export to GCS.

BigQuery export table

This opens the following screen

Export to GCS

In Select GCS location you define the bucket, the folder and the file.

For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)

daria_bucket/test.csv

If the file is too big (more than 1 GB), you'll get an error. To fix it, you'll have to save it in more files using wildcard. So, you'll need to add *, just like that

daria_bucket/test*.csv

Wildcard export to GCS

This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.

Step 3

Then go to Storage and you'll see the bucket.

GCS bucket

Go inside of it and you'll find the one (or more) file(s). You can then download from there.

Solution 3

BigQuery does not provide ability to directly export/download query result to GCS or Local File. First you need to get result of query either in explicitly set destination table or if not set you can use temp (anonymous) table that holds query result - you can get it (table) from respective job attribute configuration.query.destinationTable (after job is completed)
Then you can use that table as a source for Export job

Solution 4

You can always use bash scripts for things that are not supported by gcloud cli. For example you can dump query results into a file:

bq query --format=csv --max_rows=999999 --use_legacy_sql=false\
  "select dept_id,sum(sal) from temp.employee group by dept_id" > employee.csv

And then you can just upload it to cloud storage:

gsutil mv employee.csv 'gs://XXXX/employee.csv'

Solution 5

These days (till they change the console again) you can achieve this by

1_ run your query   
2_ when it is completed, then you can use "Save the result" dropdown
to save it in a *local/cloud* file.

Be advised this option is not supported on command-line or API call at the moment :-)

Share:
26,092
Munesh
Author by

Munesh

Overall 8+ years of IT experience. 2+ years of hands-on experience with major components in Hadoop Ecosystem including HDFS, MapReduce, Hive, Apache Spark & dataframes using Python, Sqoop and Flume 6 years in Java/J2EE development.

Updated on July 05, 2022

Comments

  • Munesh
    Munesh almost 2 years

    I want to export query results from BigQuery to local file/Google storage.

    I tried 'bq extract' command but it doesn't allow query as input.

    Usage: bq extract <source_table> <destination_uris>
    

    I don't want to extract the whole table as the table contains many columns which are not required and I need to aggregate the data.

    As of now, the only workaround I could find is to create a table using the 'bq query' command and use the 'bq extract' to extract the data.

    I'm looking for any better way to achieve this by doing something like below.

    bq extract 'select dept_id,sum(sal) from temp.employee 
                                     group by dept_id' 'gs://XXXX/employee.csv'