Custom GCS Storage

Configure Slateo to use your own Google Cloud Storage (GCS) buckets for query results and file storage.

Overview

By default, Slateo stores query results and uploaded files in Slateo-managed storage. For organizations with data residency requirements or compliance needs, you can configure Slateo to use GCS buckets in your own Google Cloud project.

Slateo accesses your GCS buckets using a GCP service account that you provide. You can either supply a dedicated service account JSON key for storage, or reuse the service account credentials from your BigQuery datasource configuration.

Key Benefits:

Data never stored outside your Google Cloud project
Full control over encryption, retention, and access policies
Audit trail in your own Cloud Audit Logs
Option to reuse existing BigQuery service account credentials

Prerequisites

Before starting, ensure you have:

Google Cloud project with permissions to create GCS buckets and manage IAM
A service account with a JSON key (or an existing BigQuery datasource configured in Slateo with service account credentials)

Setup steps

Step 1: Create GCS buckets

Create two GCS buckets in your Google Cloud project:

Bucket Purpose	Recommended Naming	Description
Uploads	`{company}-slateo-uploads`	User file uploads, CSVs, exports
Cache	`{company}-slateo-cache`	Query result caching

Both buckets should be created in the same project and location as your BigQuery datasets for lowest latency.

Using gcloud CLI:

# Create uploads bucket
gcloud storage buckets create gs://{company}-slateo-uploads \
  --project={your-project-id} \
  --location=us-central1 \
  --uniform-bucket-level-access

# Create cache bucket
gcloud storage buckets create gs://{company}-slateo-cache \
  --project={your-project-id} \
  --location=us-central1 \
  --uniform-bucket-level-access

Enforce public access prevention (required for both buckets):

gcloud storage buckets update gs://{company}-slateo-uploads \
  --public-access-prevention=enforced

gcloud storage buckets update gs://{company}-slateo-cache \
  --public-access-prevention=enforced

Bucket naming rules: 3-63 characters, lowercase letters, numbers, hyphens, underscores, and periods only. Cannot start with goog or contain google.

Step 2: Grant service account permissions

The service account used by Slateo needs access to both buckets.

Option A: Predefined role (simplest)

Grant roles/storage.objectAdmin on each bucket:

gsutil iam ch serviceAccount:SERVICE_ACCOUNT_EMAIL:objectAdmin \
  gs://{company}-slateo-uploads

gsutil iam ch serviceAccount:SERVICE_ACCOUNT_EMAIL:objectAdmin \
  gs://{company}-slateo-cache

Option B: Custom role (least privilege)

Create a custom role with these permissions and bind it to the service account on each bucket:

storage.objects.create
storage.objects.get
storage.objects.delete
storage.objects.list
storage.buckets.get

If you plan to reuse your BigQuery datasource credentials (instead of providing a separate JSON key), the BigQuery service account must also have these storage permissions.

Step 3: Configure CORS on cache bucket

Add CORS configuration to the cache bucket to allow browser-based downloads of query results. This follows the GCS wildcard origin pattern.

Save this as cors.json:

[
  {
    "origin": [
      "https://*.slateo.ai",
      "https://slateo.ai"
    ],
    "method": ["GET", "HEAD"],
    "responseHeader": ["ETag"],
    "maxAgeSeconds": 3600
  }
]

Apply it:

gsutil cors set cors.json gs://{company}-slateo-cache

To verify:

gsutil cors get gs://{company}-slateo-cache

CORS is required on the cache bucket because query results are downloaded directly by the browser. The uploads bucket does not need CORS since file uploads use server-side presigned URLs.

If you use the same bucket for both uploads and cache, apply this CORS configuration to that shared bucket.

Step 4: (Optional) Configure CMEK encryption

If using Customer-Managed Encryption Keys (CMEK) for encryption at rest, the service account must have roles/cloudkms.cryptoKeyEncrypterDecrypter on the relevant key ring:

gcloud kms keys add-iam-policy-binding {key-name} \
  --project={your-project-id} \
  --location={location} \
  --keyring={keyring-name} \
  --member=serviceAccount:SERVICE_ACCOUNT_EMAIL \
  --role=roles/cloudkms.cryptoKeyEncrypterDecrypter

Step 5: Optional BigQuery staging bucket

If your database connection is BigQuery, you can also configure a GCS Staging Bucket in the BigQuery connection settings.

This bucket is used for BigQuery native load jobs when importing data. When configured, Slateo can stage files in GCS and ask BigQuery to load them directly, which is typically much faster than row-by-row INSERT-based loading.

Use a staging bucket when:

you expect large CSV or file imports into BigQuery
you want faster load performance
you want to use BigQuery-native load jobs instead of INSERT-based ingestion

This staging bucket is separate from custom GCS storage for uploads and cache:

GCS Upload Bucket and GCS Cache Bucket control where Slateo stores uploaded files and cached query results
GCS Staging Bucket is only used as temporary staging storage for BigQuery load operations

You do not need a staging bucket to use custom GCS storage, and you do not need custom GCS storage to use a staging bucket.

Step 6: Configure buckets in Slateo

After configuring your buckets and permissions, update your database configuration in Slateo:

Go to Admin → Databases
Click the settings icon (gear) on your database connection
Select GCS as the storage provider
Enter the following fields:
- GCS Upload Bucket: Your uploads bucket name
- GCS Cache Bucket: Your cache bucket name
- GCP Project ID: The project containing your buckets
- Service Account JSON (optional): A service account JSON key with storage access. If omitted, Slateo uses the credentials from your BigQuery datasource configuration.
Click Save to apply the configuration

After saving, the GCS storage status will show as Pending. Wait until the status changes to Ready before testing the integration.

All three of the GCS Upload Bucket, GCS Cache Bucket, and GCP Project ID fields must be specified together. You may enter the same bucket name in both bucket fields if you want uploads and cache to share a single bucket.

Only organization admins can access database configuration settings.

Security considerations

Practice	Description
Uniform bucket-level access	Enable on both buckets for consistent IAM-based access control
Public access prevention	Enforce on both buckets to block any public access
Dedicated service account	Use a separate service account for Slateo storage (not the same SA used for BigQuery queries)
Cloud Audit Logs	Enable data access logging for monitoring
CMEK encryption	Use customer-managed keys for additional control over encryption at rest

Troubleshooting

Access denied errors

Check GCS storage status - Ensure the status shows Ready in Admin → Databases
Verify service account permissions - The service account must have roles/storage.objectAdmin (or equivalent custom role) on both buckets
Check bucket names match exactly (case-sensitive)
Verify service account JSON key in Slateo matches the account with permissions

Browser download failures (query results)

If you see CORS errors or TypeError: Failed to fetch when viewing query results:

Verify CORS configuration is applied to the cache bucket, or to the shared bucket if uploads and cache use the same bucket
Check origin values include https://*.slateo.ai and https://slateo.ai
Verify methods include GET and HEAD
Clear browser cache - The browser may have cached a failed CORS preflight. Try a hard refresh (Cmd+Shift+R / Ctrl+Shift+R) or incognito window.

Run gsutil cors get gs://YOUR-CACHE-BUCKET to verify the configuration.

Signed URL errors

The service account must have the iam.serviceAccounts.signBlob permission, or the JSON key must include the private key. If using Workload Identity Federation instead of a JSON key, ensure the roles/iam.serviceAccountTokenCreator role is granted.

CMEK errors

Verify the key exists and is enabled (not pending deletion)
Ensure the key is in the same location as the bucket
Check the service account has roles/cloudkms.cryptoKeyEncrypterDecrypter on the key

General issues

Confirm buckets exist in the specified project
Check Cloud Audit Logs for detailed error information
Contact support with your organization slug and error details

Frequently asked questions

Can I use a single bucket for both uploads and cache?

Yes. You can use a single bucket for both uploads and cache.

Slateo stores uploads and cached query results under different prefixes, so the objects do not collide. If you use a shared bucket, make sure it includes the required CORS configuration for query-result downloads. Separate buckets are still recommended if you want different lifecycle, retention, or access policies for uploads and cache.

Can I reuse my BigQuery service account credentials?

Yes. If your BigQuery datasource is configured with a service account JSON key, you can leave the storage service account field empty and Slateo will use the BigQuery credentials. The BigQuery service account must have the required storage permissions on both buckets.

How do I verify the integration is working?

After configuration, run a query and check that the results are cached in your bucket. Query cache files are stored with the prefix {org-id}/query-cache/parquet/. You can also upload a CSV file and verify it appears under {org-id}/uploads/.

Can I migrate existing data to my buckets?

When you configure custom buckets, new data will automatically go to your buckets. Existing cached query results will remain in their previous location and become inaccessible through Slateo (you can re-run queries to regenerate them). Previously uploaded files will also need to be re-uploaded. Contact support if you need to discuss migration of historical data.

What locations are supported?

Any GCS location is supported. For lowest latency, place your buckets in the same location as your BigQuery datasets.

How does this affect my Google Cloud bill?

You will see GCS storage and operation charges in your Google Cloud project for data stored in your buckets. Standard GCS pricing applies.