How to use Service Accounts with gsutil, for uploading to CS + BigQuery

19,164

Solution 1

Google Cloud Storage just released a new version (3.26) of gsutil that supports service accounts (as well as a number of other features and bug fixes). If you already have gsutil installed you can get this version by running:

gsutil update

In brief, you can configure a service account by running:

gsutil config -e

See gsutil help config for more details about using the config command.
See gsutil help creds for information about the different flavors of credentials (and different use cases) that gsutil supports.

Mike Schwartz, Google Cloud Storage Team

Solution 2

To extend @Mike answer, you'll need to

  1. Download service account key file, and put it in e.g. /etc/backup-account.json
  2. gcloud auth activate-service-account --key-file /etc/backup-account.json

And now all calls use said service account.

Solution 3

First of all, you should be using the bq command line tool to interact with BigQuery from the command line. (Read about it here and download it here).

I agree with Marc that it's a good idea to use your personal credentials with both gsutil and bq, the bq command line tool supports the use of service accounts. The command to use service account auth might look something like this.

bq --service_account [email protected] --service_account_credential_store keep_me_safe --service_account_private_key_file myfile.key query 'select count(*) from publicdata:samples.shakespeare' 

Type bq --help for more info.

It's also pretty easy to use service accounts in your code via Python or Java. Here's a quick example using some code from the BigQuery Authorization guide.

import httplib2

from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials

# REPLACE WITH YOUR Project ID
PROJECT_NUMBER = 'XXXXXXXXXXX'
# REPLACE WITH THE SERVICE ACCOUNT EMAIL FROM GOOGLE DEV CONSOLE
SERVICE_ACCOUNT_EMAIL = '[email protected]'

f = file('key.p12', 'rb')
key = f.read()
f.close()

credentials = SignedJwtAssertionCredentials(
    SERVICE_ACCOUNT_EMAIL,
    key,
    scope='https://www.googleapis.com/auth/bigquery')

http = httplib2.Http()
http = credentials.authorize(http)

service = build('bigquery', 'v2')
datasets = service.datasets()
response = datasets.list(projectId=PROJECT_NUMBER).execute(http)

print('Dataset list:\n')
for dataset in response['datasets']:
  print("%s\n" % dataset['id'])

Solution 4

Service accounts are generally used to identify applications but when using gsutil you're an interactive user and it's more natural to use your personal account. You can always associate your Google Cloud Storage resources with both your personal account and/or a service account (via access control lists or the developer console Team tab) so my advice would be to use your personal account with gsutil and then use a service account for your application.

Solution 5

Posting as an answer, instead of a comment, based on Jonathan's request

Yes, an OAuth grant made by an individual user will no longer be valid if the user no longer exists. So, if you use the user-based flow with your personal account, your automated processes will fail if you leave the company.

We should support service accounts with gsutil, but don't yet.

You could do one of:

  1. Probably add the feature quickly to gsutil/oauth2_plugin/oauth2_helper.py using the existing python oauth client implementation of service accounts
  2. Retrieve the access token externally via the service account flow and store it in the cache location specified in ~/.boto (slightly hacky)
  3. Create a role account yourself (via gmail.com or google apps) and grant permission to that account and use it for the OAuth flow.

We've filed the feature request to support service accounts for gsutil, and have some initial positive feedback from the team. (though can't give an ETA)

Share:
19,164
jonathan
Author by

jonathan

Updated on June 03, 2022

Comments

  • jonathan
    jonathan almost 2 years

    How do I upload data to Google BigQuery with gsutil, by using a Service Account I created in the Google APIs Console?

    First I'm trying to upload data to Cloud Storage using gsutil, as that seems to be the recommended model. Everything works fine with gmail user approval, but it does not allow me to use a service account.

    It seems I can use the Python API to get an access token using signed JWT credentials, but I would prefer using a command-line tool like gsutil with support for resumable uploads etc.

    EDIT: I would like to use gsutil in a cron to upload files to Cloud Storage every night and then import them to BigQuery.

    Any help or directions to go would be appreciated.