Introduction to Analytics Hub Data Sharing
Analytics Hub is the BigQuery-native service that turns a private dataset into something a partner, customer, or another internal team can query without copying a single byte. It sits on top of BigQuery's authorized dataset machinery and adds a publish-and-subscribe layer with exchanges, listings, and linked datasets. For the PDE exam, Analytics Hub data sharing shows up whenever a question asks how to distribute data across project boundaries, organizations, or even Cloud customers without building a pipeline.
The promise is simple. A publisher curates a view or a table inside BigQuery, wraps it in a listing, and a subscriber clicks subscribe to get a read-only linked dataset that behaves like a local one. Storage stays with the publisher. Queries run in the subscriber's project and bill against the subscriber's slots. That separation is what makes Analytics Hub data sharing the default answer to almost any "share BigQuery data with another company" question on the test.
白話文解釋(Plain English Explanation)
Think of Analytics Hub Data Sharing as a Public Library Catalog
Walk into a public library and you do not own any of the books. The librarian curates a catalog, decides which titles are on the open shelf, and stamps a card for each book that leaves the building. You browse the catalog, pick a title, and read it without ever touching the storage room in the back.
Analytics Hub data sharing follows the same rhythm. The publisher is the librarian. The data exchange is the building. A listing is a card in the catalog. The linked dataset is the borrowed book that lives on your desk for as long as the librarian lets you keep it. You do not own the underlying table. You do not pay for storing it. The librarian can pull a title off the shelf at any moment and your card stops working, but until then you can read it as often as you like.
This analogy also explains why Analytics Hub data sharing keeps a single source of truth. The library does not photocopy a book every time someone borrows it. There is one physical copy, and many readers. When the publisher updates a row in BigQuery, every subscriber sees the change immediately. No replication lag. No stale snapshot.
Think of It as a Shopping Mall Food Court
A food court has a leasing office (the data exchange administrator), a set of stalls (the listings), and a stream of hungry customers (the subscribers). The mall does not cook. It rents space and enforces rules: opening hours, hygiene standards, signage. Each stall owner decides their own menu, prices, and portion sizes. Customers walk in, pick a stall, pay if there is a price, and walk out with food.
In Analytics Hub data sharing the mall is the exchange. A private exchange is an invitation-only food court inside an office building. A public exchange listed on Google Cloud Marketplace is the open mall on the high street. The food court operator never owns the recipes; the stalls own them. That is exactly the publisher-subscriber split the service is built on.
Think of It as Subscribing to a Streaming Service
When you sign up for a music streaming service you do not download every album to your hard drive. You get a license that says "for as long as you pay, you can listen to this catalog." The catalog updates. New albums appear. Some get pulled when licensing expires. You do not manage any of that.
A subscriber to an Analytics Hub listing is in the same position. The linked dataset shows up in your BigQuery project. You can SELECT, JOIN, and even build a materialized view on top of it. But you cannot INSERT, UPDATE, or DELETE. The publisher controls the supply. If the publisher revokes the listing tomorrow, your linked dataset becomes unreadable. The subscription model is rental, not ownership.
A read-only BigQuery dataset created in the subscriber's project that points to a shared dataset in the publisher's project. It appears in the BigQuery UI like any other dataset, but no data is copied. Queries are billed to the subscriber, and storage stays with the publisher. See https://cloud.google.com/bigquery/docs/analytics-hub-introduction
Core Concepts of Analytics Hub Data Sharing
Five primitives carry every Analytics Hub workflow. Knowing how they nest is the fastest path to answering exam scenarios correctly.
A data exchange is the top-level container. It belongs to a Google Cloud project and a region. Exchanges are either private (you grant access by IAM) or public (discoverable on Google Cloud Marketplace). Inside an exchange, publishers create listings.
A listing is the published unit of sharing. It points to one BigQuery shared dataset and carries metadata: title, description, documentation URL, request access settings, primary contact, and categories. A single dataset can back multiple listings if you want to present different versions to different audiences.
A shared dataset is the source-of-truth BigQuery dataset in the publisher's project. It can include tables, views, materialized views, external tables (with caveats), and routines. The publisher decides what to expose by selecting specific resources or sharing the whole dataset.
A subscription is what a subscriber creates when they click "Subscribe" on a listing. It produces a linked dataset in a project and region the subscriber chooses. The subscription tracks the relationship and lets the publisher see who consumes what.
A linked dataset is the read-only mirror that lands in the subscriber's project. It is a pointer, not a copy. Queries against it execute in the subscriber's region and slot reservation, but the underlying bytes never leave the publisher's storage.
Analytics Hub respects BigQuery region boundaries. A listing can only be subscribed to from the same region or multi-region as the source dataset. Cross-region subscriptions require the publisher to first replicate the dataset using BigQuery dataset replication or scheduled queries. Plan regions early. https://cloud.google.com/bigquery/docs/analytics-hub-introduction#regions
Architecture and Design Patterns
Analytics Hub data sharing tends to land in one of four reference patterns, and the PDE exam tests each of them.
The internal data mesh pattern uses one private exchange per business domain. Marketing publishes its curated tables, finance publishes ledger views, product analytics publishes event aggregates. Every team subscribes to what it needs. Storage stays domain-owned, governance stays domain-owned, but discovery is centralized through the exchange catalog. This is what most large enterprises adopt as they move off the "everyone has access to one giant project" model.
The partner data sharing pattern uses a private exchange shared across organizations. A retailer publishes daily inventory snapshots; an analytics consultancy subscribes from its own organization. Two-way sharing requires two exchanges and two listings: one in each direction. The IAM principal granted access can be a Google Account, a service account, a Google Group, or even an entire Google Workspace domain.
The monetized marketplace pattern uses a public listing on Google Cloud Marketplace with a paid subscription tier handled by the Cloud Commerce platform. The publisher signs the partner agreement, sets the price model, and Google handles billing and tax. The subscriber's BigQuery query bill is independent from the marketplace subscription fee.
The clean room pattern combines Analytics Hub data sharing with BigQuery's data clean rooms feature. Two parties contribute datasets to a shared analytical environment but neither can see the other's row-level data. Only aggregate query results above a configured privacy threshold are returned. This pattern matters for advertising, healthcare research, and any scenario where regulation forbids raw PII exchange.
Publisher project (us-central1)
+-------------------+
| BigQuery dataset | <-- source of truth
| - sales_summary |
| - product_dim |
+--------+----------+
|
v
+-------------------+
| Analytics Hub |
| Data Exchange | <-- private or public
| - Listing A |
| - Listing B |
+--------+----------+
|
| subscribe
v
Subscriber project (us-central1)
+-------------------+
| Linked dataset | <-- read-only pointer
| - sales_summary |
| - product_dim |
+-------------------+
When designing a partner-facing exchange, name your listings with the consumer in mind, not the producer. "Daily Retail Inventory Feed" beats "warehouse_db_v3_curated" every time. The listing title is what subscribers see in the catalog, and discoverability is the whole point of Analytics Hub data sharing. https://cloud.google.com/bigquery/docs/analytics-hub-listings
GCP Service Deep Dive
Data Exchanges in Detail
A data exchange is created with bq mk --data_exchange or through the BigQuery console under the Analytics Hub section. The exchange has a display name, description, primary contact, documentation URL, and an icon. Region is fixed at creation. The exchange has its own IAM policy with roles like roles/analyticshub.admin (full control), roles/analyticshub.publisher (create and manage listings), and roles/analyticshub.viewer (browse the catalog).
Public exchanges live on Google Cloud Marketplace. To publish one, the project must be enrolled in the Producer Portal and approved by Google. Private exchanges have no such requirement; you create one in your project and immediately start adding listings.
Listings in Detail
Each listing wraps a single shared dataset and exposes either the entire dataset or a curated subset of resources (specific tables, views, or routines). At publication time the publisher specifies request-access behavior: open subscription (anyone with viewer access on the exchange can subscribe), or restricted subscription (subscribers must request and the publisher approves manually).
Listings carry rich metadata: categories (Financial, Healthcare, Public sector), data provider name, data refresh frequency, documentation links, and contact email. Good metadata is what makes a listing findable in the marketplace search.
Subscriptions in Detail
A subscription has a state: active, stale, or out-of-sync. Active means the subscriber is reading happily. Stale means the publisher modified the listing in a way that requires the subscriber to refresh the linked dataset. Out-of-sync means the source dataset structure changed (a column was dropped) and the subscriber should review.
Subscribers see subscriptions in their project under the Analytics Hub section. Publishers see all active subscriptions per listing, which is useful for usage analytics and for deciding when to deprecate a listing.
IAM for Analytics Hub Data Sharing
Two IAM surfaces matter. The exchange-level IAM controls who can administer, publish to, or browse the exchange. The listing-level IAM controls who can subscribe to a specific listing. A common mistake is to grant a partner roles/analyticshub.viewer on the exchange thinking that grants subscription rights. It does not. Subscribing requires roles/analyticshub.subscriber on the listing itself, plus roles/bigquery.user (or higher) in the project where the linked dataset will land.
Service accounts can subscribe too. This matters for automated downstream pipelines that consume shared data on a schedule.
A subscriber needs roles/bigquery.user in their own project to create the linked dataset. If the subscribe action fails with a confusing permission error, the missing role is almost always on the subscriber's side, not the publisher's. The error message rarely points at the right project. https://cloud.google.com/bigquery/docs/analytics-hub-grant-roles
Cross-Organization Sharing
Analytics Hub data sharing supports sharing across Google Cloud organizations natively. The publisher grants roles/analyticshub.subscriber on a listing to a Google Account, group, service account, or domain that lives in a different organization. The subscriber accepts and creates the linked dataset in their own organization's project.
Exchange-level and listing-level IAM are two distinct surfaces and the PDE exam tests both. roles/analyticshub.admin and roles/analyticshub.publisher on the data exchange govern who can administer the exchange or create listings inside it. roles/analyticshub.subscriber is granted on the individual listing and controls who can produce a linked dataset from it. Granting only roles/analyticshub.viewer on the exchange lets a partner browse the catalog but never subscribe; that is the wrong-role distractor questions love to plant. https://cloud.google.com/bigquery/docs/analytics-hub-grant-roles
VPC Service Controls add a wrinkle. If the publisher project is inside a VPC-SC perimeter, subscribers from outside the perimeter cannot read the linked dataset even if IAM permits subscription. The fix is to add an ingress rule that explicitly allows the subscriber identities to call the BigQuery API on the protected resources.
Monetization and Marketplace Integration
Paid listings on Google Cloud Marketplace use the Cloud Commerce Producer Portal. The publisher defines a price model (flat monthly, tiered, free trial), and Google handles invoicing the subscriber. The publisher receives revenue net of Google's marketplace fee.
There are two cost streams the subscriber pays. The marketplace subscription fee goes to the publisher. The BigQuery query cost (on-demand bytes scanned, or slot consumption) is billed to the subscriber's project regardless of who published the data. This separation is critical to understand: a free listing is not free to query, and a paid listing's price does not cover the subscriber's compute.
Routines, External Tables, and Sharing Limits
Routines (UDFs and stored procedures) can be included in shared datasets, which means a publisher can ship logic alongside data. External tables backed by Cloud Storage can also be shared, but the subscriber must have access to the underlying GCS bucket separately for federated queries to succeed. BigLake tables are usually a better fit because they decouple table-level access from the bucket IAM.
Analytics Hub data sharing creates linked datasets that are pointers, not copies. Storage cost stays with the publisher, query cost goes to the subscriber, and the source must be in the same region or multi-region as the subscriber's linked dataset. https://cloud.google.com/bigquery/docs/analytics-hub-introduction
Common Pitfalls and Trade-offs
Region mismatch is the number-one issue I see in the field. A publisher in us-central1 cannot directly serve subscribers in eu-west1. The fix is dataset replication, which adds storage cost and replication lag. Plan multi-region (US, EU) datasets if you know subscribers will be global from day one.
External tables backed by Cloud Storage create silent failures. The publisher shares a dataset that contains an external table; the subscriber's linked dataset lists the table; queries return permission errors because the subscriber's identity has no read access to the bucket. Migrate to BigLake or copy the data into a managed BigQuery table before sharing.
Authorized views in shared datasets behave subtly. If the shared dataset contains a view that selects from a table in a different dataset (the underlying base table), the subscriber needs no access to that base table because the view runs with the publisher's permissions. This is the standard authorized view pattern, and Analytics Hub data sharing inherits it. Get this wrong and you either expose too much (publishing the base dataset directly) or too little (forgetting to authorize the view).
Cost surprises happen when a popular public listing is queried by hundreds of subscribers. The publisher's BigQuery storage bill is unaffected, but the subscriber's on-demand query bill can spike. Subscribers should consider switching to slot reservations or use BI Engine for hot tables.
Schema evolution is a real risk. If the publisher drops a column or renames a table, every subscriber's downstream query breaks. There is no version pinning in Analytics Hub data sharing today. The mitigation is to publish stable views over volatile base tables and treat the view layer as a contract.
Quota limits cap exchange and listing counts per project. A single project can host a finite number of exchanges and listings (check the official quotas page for current numbers). Large enterprises with many internal teams should plan exchange ownership early, or split across multiple publisher projects.
Best Practices
- Publish views, not raw tables. Views give you a stable contract layer, hide internal columns, and let you refactor the storage schema without breaking subscribers.
- Use one exchange per audience class. Internal teams, trusted partners, and public marketplace consumers each deserve their own exchange with appropriate IAM defaults.
- Tag every shared dataset with
data_sensitivityandpii_presentlabels and pair Analytics Hub data sharing with column-level security or BigQuery data masking when sharing partially sensitive tables. - Document the refresh cadence in the listing metadata. Subscribers query much more confidently when they know whether the data is real-time, hourly, or daily.
- Monitor subscription analytics and contact subscribers before deprecating a listing. Silent removal causes downstream pipeline outages and damages partner relationships.
- For cross-region needs, design replicated datasets and publish a listing per region from the replicated copy rather than forcing subscribers to deal with replication themselves.
Real-World Use Case
Consider a mid-sized retail group with 12 brand subsidiaries operating across Europe and North America. Each brand runs its own BigQuery project for analytics, and the group has a central data platform team that maintains shared dimensions: store hierarchy, product catalog, customer loyalty membership, and currency conversion rates.
Before Analytics Hub data sharing, the group used scheduled queries to copy these dimensions into every brand's project nightly. Storage costs were duplicated 12 times. Schema drift was constant: a brand would query a stale snapshot for a week before someone noticed. The data platform team spent two engineers full-time just maintaining the replication fan-out.
The migration to Analytics Hub data sharing took six weeks. The platform team created one private exchange per region (group-shared-eu, group-shared-na), published listings backed by curated views over the master dimension tables, and granted roles/analyticshub.subscriber on each listing to a Google Group containing the brand's analyst service accounts. Each brand subscribed to the listings they needed, ending up with linked datasets that always reflected the current state.
Storage cost dropped to one copy. Schema changes propagated instantly. The replication pipelines were retired. Two engineers redirected to higher-value work. Brand analysts started writing JOINs across their own transactional data and the shared dimensions in a single query, which was previously a multi-step ETL.
The group later opened a public listing for anonymized aggregate sales data on Google Cloud Marketplace, monetized at a flat monthly fee per subscriber, with industry analysts as the target audience. The same Analytics Hub data sharing infrastructure served both internal data mesh and external monetization workloads.
Exam Tips
The PDE exam loves scenario questions where two companies need to share BigQuery data without copying it. If the answer choices include Analytics Hub, that is almost always the right pick. Watch out for distractors that suggest exporting to Cloud Storage and re-importing on the subscriber side, or scheduled queries between projects; both work but are wrong when Analytics Hub data sharing is on the menu.
Region constraints are tested directly. If a question gives you a publisher in us-east1 and a subscriber in europe-west1, the only correct flow is to replicate the dataset to a region the subscriber can reach, then publish from there. There is no magical cross-region linked dataset.
IAM permissions show up in two flavors. First, who can administer an exchange (roles/analyticshub.admin) versus who can publish listings (roles/analyticshub.publisher). Second, who can subscribe to a listing (roles/analyticshub.subscriber) versus who can browse it (roles/analyticshub.viewer). Memorize which role does what; the exam writes scenarios where the wrong role is granted and asks you to fix it.
Public versus private exchanges trip up exam takers. A private exchange is invitation-only, controlled by IAM grants. A public exchange is published to Google Cloud Marketplace, discoverable by anyone with a Cloud account, and requires Producer Portal enrollment. If the question mentions "discoverable by external customers" or "monetized," think public. If it says "limited to specific partner organizations," think private.
Data clean rooms are an emerging exam topic that builds on Analytics Hub data sharing. Know that clean rooms add privacy-preserving query restrictions on top of the standard sharing flow, and they exist for advertising, healthcare, and finance use cases.
Cost allocation is a frequent question. Storage is paid by the publisher. Query is paid by the subscriber. Marketplace subscription fee, if any, flows from subscriber to publisher net of Google's cut. Get this triangle right and you handle most cost-related Analytics Hub questions.
Frequently Asked Questions
What is the difference between Analytics Hub and BigQuery authorized datasets?
Authorized datasets are a low-level BigQuery feature that grants one dataset read access to another via IAM, scoped to a single project relationship. Analytics Hub data sharing builds on top of this primitive and adds the exchange and listing layer, plus a discovery catalog, subscription tracking, marketplace integration, and cross-organization workflows. For a one-off internal share between two projects, authorized datasets are simpler. For anything involving multiple subscribers, partners, or external organizations, Analytics Hub is the right tool because it scales the governance model with you.
Can subscribers modify the data in a linked dataset?
No. Linked datasets are strictly read-only. The subscriber can SELECT, JOIN, build views, and even create materialized views in their own project that read from the linked dataset. Any write operation (INSERT, UPDATE, DELETE, CREATE TABLE within the linked dataset) is rejected. To enrich shared data, subscribers create derived tables in their own dataset that read from the linked one.
How do I share data across Google Cloud regions?
You cannot directly. Analytics Hub data sharing requires the source dataset and the linked dataset to be in the same region or multi-region. To serve subscribers in a different region, replicate the source dataset using BigQuery cross-region dataset replication or scheduled copy jobs into a dataset in the target region, then publish a listing from the replicated copy. Multi-region datasets (US, EU) are a good shortcut if you know your audience is broadly geographic.
What does it cost to publish or subscribe?
Analytics Hub itself does not charge for exchanges, listings, or subscriptions. The publisher pays standard BigQuery storage prices for the source dataset. The subscriber pays standard BigQuery query prices (on-demand or slot-based) for queries against the linked dataset. If the listing is monetized through Google Cloud Marketplace, the subscriber additionally pays the marketplace subscription fee, which Google bills and remits to the publisher minus a marketplace fee.
How do I revoke access to a subscriber I no longer want to serve?
Two options. Soft revocation: remove the subscriber's principal from the listing's IAM policy. Existing subscriptions stop returning data, and new subscriptions cannot be created. Hard revocation: delete the listing entirely, which immediately invalidates every subscription pointing to it. For monetized listings, also coordinate with Cloud Commerce to handle the subscription cancellation and any billing cleanup.
Can I share data outside of Google Cloud, for example to AWS or Snowflake?
Not natively through Analytics Hub. The service is a BigQuery-to-BigQuery sharing mechanism. To serve consumers on other clouds, use BigQuery Omni (which lets BigQuery query data living in S3 or Azure Blob Storage, but not the reverse), or export the data to Cloud Storage and let the consumer pull it. Some third-party data sharing platforms bridge BigQuery and Snowflake, but they sit outside the Analytics Hub service.
What happens to subscribers if the publisher deletes the source dataset?
Linked datasets immediately fail to return data. The subscription state moves to an error condition. There is no automatic notification today, so subscribers discover the failure through their own query monitoring. Publishers who plan to delete a source dataset should communicate proactively to every active subscriber and consider deprecating listings before deleting underlying storage.
Related Topics
- BigQuery Data Modeling and Clustering covers the schema design choices that determine what makes a good shared dataset.
- Storage Security and IAM Best Practices goes deeper into the IAM principals, groups, and service accounts that govern who can publish and subscribe.
- Data Sovereignty and Compliance Design explores the regional and regulatory constraints that often dictate where Analytics Hub exchanges can legally operate.