BigQuery - Export query results to local file/Google storage
Solution 1
Direct export from BigQuery Standard SQL was added recently: Exporting data to csv format
EXPORT DATA OPTIONS(
uri='gs://mybucket/myfolder2/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT 1 as field1, 2 as field2
Solution 2
As Mikhail Berlyant said,
BigQuery does not provide ability to directly export/download query result to GCS or Local File.
You can still export it using the Web UI in just three steps
- Configure query to save the results in a BigQuery table and run it.
- Export the table to a bucket in GCS.
- Download from the bucket.
Step 1
When in BigQuery screen, before running the query go to More > Query Settings
This opens the following
Here you want to have
- Destination: Set a destination table for query results
- Project name: select the project.
- Dataset name: select a dataset. If you don't have one, create it and come back.
- Table name: give whatever name you want (must contain only letters, numbers, or underscores).
- Result size: Allow large results (no size limit).
Then Save it and the Query is configured to be saved in a specific table. Now you can run the Query.
Step 2
To export it to GCP you have to go to the table and click EXPORT > Export to GCS.
This opens the following screen
In Select GCS location you define the bucket, the folder and the file.
For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)
daria_bucket/test.csv
If the file is too big (more than 1 GB), you'll get an error. To fix it, you'll have to save it in more files using wildcard. So, you'll need to add *, just like that
daria_bucket/test*.csv
This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.
Step 3
Then go to Storage and you'll see the bucket.
Go inside of it and you'll find the one (or more) file(s). You can then download from there.
Solution 3
BigQuery does not provide ability to directly export/download query result to GCS or Local File. First you need to get result of query either in explicitly set destination table or if not set you can use temp (anonymous) table that holds query result - you can get it (table) from respective job attribute configuration.query.destinationTable
(after job is completed)
Then you can use that table as a source for Export job
Solution 4
You can always use bash scripts for things that are not supported by gcloud cli. For example you can dump query results into a file:
bq query --format=csv --max_rows=999999 --use_legacy_sql=false\
"select dept_id,sum(sal) from temp.employee group by dept_id" > employee.csv
And then you can just upload it to cloud storage:
gsutil mv employee.csv 'gs://XXXX/employee.csv'
Solution 5
These days (till they change the console again) you can achieve this by
1_ run your query
2_ when it is completed, then you can use "Save the result" dropdown
to save it in a *local/cloud* file.
Be advised this option is not supported on command-line or API call at the moment :-)
Munesh
Overall 8+ years of IT experience. 2+ years of hands-on experience with major components in Hadoop Ecosystem including HDFS, MapReduce, Hive, Apache Spark & dataframes using Python, Sqoop and Flume 6 years in Java/J2EE development.
Updated on July 05, 2022Comments
-
Munesh almost 2 years
I want to export query results from BigQuery to local file/Google storage.
I tried 'bq extract' command but it doesn't allow query as input.
Usage: bq extract <source_table> <destination_uris>
I don't want to extract the whole table as the table contains many columns which are not required and I need to aggregate the data.
As of now, the only workaround I could find is to create a table using the 'bq query' command and use the 'bq extract' to extract the data.
I'm looking for any better way to achieve this by doing something like below.
bq extract 'select dept_id,sum(sal) from temp.employee group by dept_id' 'gs://XXXX/employee.csv'