GCP Connector

Integrate with Google Cloud Platform — Compute, Storage, Functions, and BigQuery

Google Cloud Platform (GCP) Connector

Integrate your workflows with Google Cloud Platform and leverage services like Cloud Storage, Cloud Functions, Pub/Sub, Firestore, and BigQuery—all the tools for building scalable cloud applications.

Overview

The GCP connector provides 18 operations across Google's cloud ecosystem. Whether you're storing data in Cloud Storage, running BigQuery analytics, triggering serverless functions, or managing real-time messaging, we've got you covered.

Authentication

GCP uses service accounts—think of them as robot users that can authenticate your workflows securely without needing human login credentials.

Option 1: Service Account (Recommended)

Create a service account in your GCP project and download the JSON key:

auth_type: service_account
project_id: "your-project-id"
credentials_json: "{{ credentials.gcp.service_account }}"

How to set it up:

  1. Go to Google Cloud Console > Service Accounts
  2. Click Create Service Account
  3. Give it a name like "DeepChain Connector"
  4. Grant necessary roles (Cloud Storage Admin, Cloud Functions Developer, etc.)
  5. Create a JSON key and paste it in the credentials_json field

Tip: Use the principle of least privilege—grant only the specific roles your workflows need, not "Owner" or "Editor".

Option 2: OAuth 2.0 (For User Impersonation)

If you need to access user-owned resources, use OAuth 2.0:

auth_type: oauth2
client_id: "your-client-id"
client_secret: "your-client-secret"
project_id: "your-project-id"

Available Operations

Cloud Storage (File Storage)

Store and retrieve files from Google's object storage:

Operation What It Does
listBuckets List all storage buckets in your project
listObjects List objects (files) in a bucket
getObject Download a file
uploadObject Upload a file
deleteObject Delete a file

Cloud Functions (Serverless Compute)

Trigger custom code without managing servers:

Operation What It Does
invokeFunction Call a Cloud Function
listFunctions List functions in your project

Pub/Sub (Real-Time Messaging)

Build scalable event-driven systems:

Operation What It Does
publish Send a message to a topic
pull Retrieve messages from a subscription
acknowledge Mark messages as processed

Firestore (NoSQL Database)

Store and query semi-structured data:

Operation What It Does
getDocument Fetch a document by path
createDocument Create a new document
updateDocument Update a document
deleteDocument Delete a document
query Query a collection with filters

BigQuery (Data Analytics)

Run SQL queries against massive datasets:

Operation What It Does
query Execute a BigQuery SQL query
insertRows Stream rows into a table

Practical Examples

Example 1: Upload Exports to Cloud Storage

Archive daily reports with organized naming:

- id: export_to_gcs
  type: gcp_connector
  config:
    operation: uploadObject
    bucket: "company-exports"
    objectName: "reports/{{ formatDate(now(), 'yyyy/MM/dd') }}/report-{{ formatDate(now(), 'HH-mm-ss') }}.csv"
    content: "{{ json_stringify(input.data) }}"
    contentType: "text/csv"

Access your file later at:

gs://company-exports/reports/2025/02/10/report-14-30-45.csv

Example 2: Run Complex Analytics with BigQuery

Query your data warehouse for insights:

- id: monthly_analytics
  type: gcp_connector
  config:
    operation: query
    sql: |
      SELECT
        DATE(event_timestamp) as date,
        event_name,
        COUNT(*) as total_events,
        COUNT(DISTINCT user_id) as unique_users
      FROM `{{ input.project_id }}.analytics.events`
      WHERE DATE(event_timestamp) = @query_date
      GROUP BY date, event_name
      ORDER BY total_events DESC
    parameters:
      query_date: "{{ input.date }}"

Example 3: Trigger a Cloud Function

Call a function to do heavy lifting like image processing:

- id: process_image
  type: gcp_connector
  config:
    operation: invokeFunction
    functionName: "image-thumbnails"
    data:
      imageUrl: "{{ input.source_image }}"
      sizes: [120, 240, 480]
      format: "webp"

Example 4: Publish Events to Pub/Sub

Send real-time events for other services to consume:

- id: notify_event
  type: gcp_connector
  config:
    operation: publish
    topic: "projects/{{ input.project_id }}/topics/user-events"
    message:
      data: "{{ base64_encode(json_stringify({
          eventType: 'order.created',
          orderId: input.order_id,
          customerId: input.customer_id,
          timestamp: now()
        })) }}"
      attributes:
        event_type: "order"
        priority: "high"

Pub/Sub encodes the message body in base64, which is why we use base64_encode().

Example 5: Store Documents in Firestore

Save structured data in Firestore:

- id: save_user_profile
  type: gcp_connector
  config:
    operation: createDocument
    collection: "users"
    documentId: "{{ input.user_id }}"
    data:
      name: "{{ input.full_name }}"
      email: "{{ input.email }}"
      createdAt: "{{ now() }}"
      metadata:
        source: "{{ input.source }}"
        tags: "{{ input.tags }}"

Later, you can query this document:

- id: fetch_user
  type: gcp_connector
  config:
    operation: getDocument
    collection: "users"
    documentId: "{{ input.user_id }}"

Rate Limits

Google Cloud services scale automatically, but here are the baseline limits:

  • Cloud Storage: 1 request per second per object (metadata), unlimited data throughput
  • Pub/Sub: 10,000 messages per second per topic (can be increased)
  • Firestore: 10,000 writes per second per database
  • BigQuery: 100 concurrent queries, 1 TB per query

Note: DeepChain handles retries and backoff automatically, so you typically won't hit these limits.

Error Handling

Common GCP Errors

Error What It Means How to Fix
PERMISSION_DENIED Your service account lacks permissions Check IAM roles; add Cloud Storage Admin, Cloud Functions Developer, etc.
NOT_FOUND Resource doesn't exist (bucket, function, etc.) Verify the resource exists and is in the right project
ALREADY_EXISTS You're trying to create a duplicate Use a different name or update the existing resource
INVALID_ARGUMENT Bad parameters or data format Check field names and data types in the operation docs
RESOURCE_EXHAUSTED You've hit a quota limit Wait a bit or request a quota increase from GCP

Debugging

Enable debug logging:

Node Configuration:
  debug: true
  logRequest: true
  logResponse: true

Check the execution logs for the full request/response.

Best Practices

1. Organize Cloud Storage Like a Pro

GCS doesn't have real folders, but you can use naming conventions:

# Good organization by date and type
objectName: "exports/reports/{{ formatDate(now(), 'yyyy/MM/dd') }}/report.json"
objectName: "logs/{{ input.service }}/{{ formatDate(now(), 'yyyy-MM-dd') }}.log"
objectName: "backups/database/{{ formatDate(now(), 'yyyy-MM-dd-HHmmss') }}.sql"

2. Use Proper IAM Roles

Don't grant Editor or Owner roles. Use specific roles:

  • Cloud Storage: roles/storage.objectAdmin (specific to your buckets)
  • Cloud Functions: roles/cloudfunctions.developer
  • BigQuery: roles/bigquery.dataEditor + roles/bigquery.jobUser
  • Firestore: roles/datastore.user

3. Set Query Timeouts for BigQuery

BigQuery queries can take time. Set reasonable timeouts:

- id: big_query
  type: gcp_connector
  config:
    operation: query
    sql: "SELECT COUNT(*) FROM `project.dataset.huge_table`"
    timeoutMs: 300000  # 5 minutes

4. Handle Pub/Sub Message Format

Pub/Sub messages require base64 encoding. Always encode:

# Correct
message:
  data: "{{ base64_encode(json_stringify(input)) }}"

# Wrong—will fail
message:
  data: "{{ json_stringify(input) }}"

5. Batch BigQuery Inserts

Instead of inserting rows one by one, batch them:

- id: batch_insert
  type: gcp_connector
  config:
    operation: insertRows
    table: "project.dataset.events"
    rows:
      - event_id: "123"
        user_id: "456"
        timestamp: "{{ now() }}"
      - event_id: "789"
        user_id: "012"
        timestamp: "{{ now() }}"

This is much faster than individual inserts.


Next Steps