Generative AI Model Garden — GCP PCA Study Notes

Q: Q1. Can I use open-source models like Llama 2 on Google Cloud?

Yes. Model Garden includes popular open-source models that you can deploy to Vertex AI Endpoints with one click.

Q: Q2. What is "Grounding" and why is it important?

Grounding is the process of connecting the AI's response to a specific source of truth (like your company's knowledge base). It is the best way to prevent the AI from making things up ("hallucinating").

Q: Q3. Is my data used to train Google's models?

No. Google's enterprise agreement ensures that any data you use for tuning or any prompts you send are kept within your project and are not used to improve Google's global foundation models.

Q: Q4. What is "Temperature" in GenAI Studio?

Temperature controls the "creativity" of the model. A low temperature (e.g., 0.1) makes the model more predictable and factual. A high temperature (e.g., 0.9) makes it more creative and varied.

Q: Q5. When should I use RAG instead of Fine-Tuning?

Use RAG when your data changes frequently (e.g., stock prices, news, daily reports). Use Fine-Tuning when you want the model to learn a specific style or format that doesn't change often.

Introduction to Generative AI on Google Cloud

For the Professional Cloud Architect, Generative AI represents a shift from "building models" to "orchestrating models." Google Cloud's Generative AI offering is centered around Vertex AI, providing access to powerful Foundation Models via the Model Garden and the Generative AI Studio.

The goal is to enable enterprises to deploy state-of-the-art AI without the massive overhead of training models from scratch.

A curated collection of Google's first-party foundation models (like Gemini and PaLM), open-source models (like Llama and Falcon), and third-party models, all available within Vertex AI for discovery, testing, and deployment. Reference: https://cloud.google.com/vertex-ai/docs/generative-ai/model-garden/explore-models

Plain-Language Explanation: Model Garden & GenAI

Generative AI is like having a team of super-intelligent, multilingual interns who have read the entire internet.

Analogy 1 — The Master Library (Model Garden)

Think of Model Garden as a Grand Master Library. Inside, you have the "Google Classics" (Gemini, PaLM), but you also have "Global Bestsellers" (Llama, Mistral). As an architect, you don't need to write the books; you just need to know which book to pick for the specific job—whether it's writing code, translating a language, or summarizing a long legal contract.

Analogy 2 — The Sculptor's Studio (Generative AI Studio)

Generative AI Studio is like a Sculptor's Studio. The library gives you the "Raw Marble" (the Foundation Model). In the studio, you use "Chisels and Sandpaper" (Prompt Engineering and Tuning) to shape that marble into a specific statue (a specialized Customer Service Bot). You can quickly try different techniques to see which one works best before committing to the final product.

Analogy 3 — The Guard Dog (Safety Settings)

Using GenAI is like having a powerful but unpredictable pet. Safety Settings are the Guard Dog's Leash and Muzzle. You can adjust how tight the leash is—blocking offensive content, ensuring the AI doesn't give medical advice, or preventing it from leaking sensitive company data.

Key Components of the GenAI Ecosystem

1. Gemini and PaLM Models

Gemini: Google's most capable multimodal model (Text, Image, Video, Audio). Available in Pro, Ultra, and Nano sizes.
PaLM 2: Optimized for text and code tasks, offering high performance and efficiency.
Codey: Specialized for code generation, completion, and chat.
Imagen: For high-quality image generation and editing.

2. Generative AI Studio

Language: Test and tune text-based models with different parameters (Temperature, Top-K, Top-P).
Vision: Generate images from text prompts or edit existing ones.
Speech: Convert text to speech or speech to text with high fidelity.

3. Model Tuning Strategies

Sometimes a general model isn't enough. You need to "fine-tune" it:

Prompt Engineering: Designing better inputs to get better outputs (No code required).
Adapter Tuning (Parameter-Efficient): Training a small "layer" on top of the foundation model. Much faster and cheaper than full fine-tuning.
Reinforcement Learning from Human Feedback (RLHF): Tuning the model based on human "thumbs up/down" ratings to align it with specific preferences.

Retrieval-Augmented Generation (RAG)

Architecting for GenAI often requires the model to access "your" data.

The Problem: Foundation models are frozen in time and don't know your internal company secrets.
The Solution (RAG): Instead of retraining the model, you provide it with the relevant data at the time of the request.
Process:
1. Convert your documents into Embeddings (numbers representing meaning).
2. Store them in a Vector Database (e.g., Vertex AI Vector Search or BigQuery Vector Search).
3. When a user asks a question, find the most relevant documents.
4. Feed those documents and the question to the model to generate an answer.

Enterprise Search and Conversation

Google provides "out-of-the-box" solutions for specific GenAI use cases:

Vertex AI Search: Build a Google-quality search engine across your internal documents (PDFs, HTML, etc.).
Vertex AI Conversation: Create sophisticated, multi-turn chat agents with minimal effort.
Architecture Tip: Use these when you need to deploy a solution fast without building the RAG pipeline from scratch.

Governance and Safety

Safety Filters: Built-in categories (Hate Speech, Harassment, Sexually Explicit) that can be adjusted based on your risk tolerance.
Grounding: Linking model responses to verifiable sources (like your own data or Google Search) to reduce "hallucinations."
Data Privacy: Google does not use your customer data or prompt inputs to train its foundation models. This is a critical point for enterprise architects.

Cost Management for GenAI

Token-based Pricing: You pay based on the number of characters/tokens processed.
Quota Management: Set limits on API calls to prevent runaway costs during experimentation.
Model Selection: Use "smaller" models (like Gemini Pro vs. Ultra) for simpler tasks to save money.

FAQ — Generative AI Model Garden

Q1. Can I use open-source models like Llama 2 on Google Cloud?

Yes. Model Garden includes popular open-source models that you can deploy to Vertex AI Endpoints with one click.

Q2. What is "Grounding" and why is it important?

Grounding is the process of connecting the AI's response to a specific source of truth (like your company's knowledge base). It is the best way to prevent the AI from making things up ("hallucinating").

Q3. Is my data used to train Google's models?

No. Google's enterprise agreement ensures that any data you use for tuning or any prompts you send are kept within your project and are not used to improve Google's global foundation models.

Q4. What is "Temperature" in GenAI Studio?

Temperature controls the "creativity" of the model. A low temperature (e.g., 0.1) makes the model more predictable and factual. A high temperature (e.g., 0.9) makes it more creative and varied.

Q5. When should I use RAG instead of Fine-Tuning?

Use RAG when your data changes frequently (e.g., stock prices, news, daily reports). Use Fine-Tuning when you want the model to learn a specific style or format that doesn't change often.

Final Architect Tip

For the PCA exam, remember the Foundation Model + RAG pattern. It is the most common architectural pattern for enterprise GenAI. If a question asks about "reducing hallucinations" or "using private company data," the answer is almost always Retrieval-Augmented Generation (RAG) or Vertex AI Search. Also, prioritize Safety Settings whenever "compliance" or "brand reputation" is mentioned.

Generative AI and Model Garden Configuration

Introduction to Generative AI on Google Cloud

Plain-Language Explanation: Model Garden & GenAI

Analogy 1 — The Master Library (Model Garden)

Analogy 2 — The Sculptor's Studio (Generative AI Studio)

Analogy 3 — The Guard Dog (Safety Settings)

Key Components of the GenAI Ecosystem

1. Gemini and PaLM Models

2. Generative AI Studio

3. Model Tuning Strategies

Retrieval-Augmented Generation (RAG)

Enterprise Search and Conversation

Governance and Safety

Cost Management for GenAI

FAQ — Generative AI Model Garden

Q1. Can I use open-source models like Llama 2 on Google Cloud?

Q2. What is "Grounding" and why is it important?

Q3. Is my data used to train Google's models?

Q4. What is "Temperature" in GenAI Studio?

Q5. When should I use RAG instead of Fine-Tuning?

Final Architect Tip

Official sources

More PCA topics