Model Tuning and Fine-Tuning

Q: Q: What is the difference between fine-tuning and prompt engineering?

A: Prompt engineering changes the instructions you send to a model at request time — it is instant, free, and requires no training. Fine-tuning changes the model itself by continuing its training on a curated dataset of examples, so a behavior becomes built-in. Prompting is the first rung of the decision ladder; tuning is the last. You should always exhaust prompting (and few-shot prompting and grounding) before considering tuning, because tuning carries upfront training cost, needs a labeled dataset, and creates an ongoing retraining obligation.

Q: Q: When should I fine-tune instead of using grounding / RAG?

A: Use the test "behavior versus knowledge". Fine-tune when you need to change how the model behaves — its style, tone, output format, or domain fluency. Use grounding / RAG when you need to change what the model knows — supplying fresh, changing, or company-specific facts. Tuning bakes behavior in at training time and cannot keep up with changing facts; grounding injects current data at request time but does not reshape voice or format. Mature systems often use both together: tuning for manner, grounding for matter.

Q: Q: What is model distillation?

A: Distillation uses a large, powerful "teacher" model to train a smaller, faster, cheaper "student" model to perform a specific task almost as well. The student is a specialist — excellent at its one task, weaker elsewhere. The payoff is lower per-request cost and lower latency , which matters most for high-volume, well-defined, repetitive tasks. On Vertex AI a large Gemini variant can act as the teacher and a smaller model as the distilled student.

Q: Q: Is a fine-tuned model a one-time project or an ongoing commitment?

A: It is an ongoing commitment . A tuned model is frozen at training time and does not learn new things on its own. When your brand voice, domain conventions, or product line change, you must gather fresh examples and re-run the tuning job. Treat a tuned model like a specialist employee who needs periodic professional development — budget for retraining from the start, and track its value continuously rather than assuming it stays optimal forever.

What Is Model Tuning and Fine-Tuning?

For the Generative AI Leader exam, model tuning is what you reach for when prompting and grounding have run out of road. As a business leader, you do not need to operate the training pipeline yourself — your job is to recognize when tuning is the right investment, what it costs to set up and maintain, and which business problems genuinely justify it versus the many that do not.

A foundation model like Gemini arrives already trained on an enormous, general corpus. It can write, summarize, translate, and reason about almost anything — but it knows nothing specifically about your company. It does not know your brand voice, your internal jargon, your preferred output format, or the quirks of your specialized domain. Most of the time, you can close that gap simply by prompting (giving good instructions) or grounding (connecting the model to your private data). But sometimes the gap is too wide, too consistent, or too detailed for prompts alone. That is the moment fine-tuning earns its keep.

Fine-tuning means taking a pre-trained foundation model and continuing its training on a curated set of your own examples — pairs of inputs and the ideal outputs you want. The model adjusts its internal behavior so that, going forward, it naturally produces responses in your style, your format, or your domain without you having to spell everything out in every prompt. On Vertex AI, this is a managed service: you supply the example dataset, Google runs the training, and you get back a private tuned version of the model that only your organization can call.

The Generative AI Leader exam frames tuning as a strategic investment decision, not a technical exercise. It tests whether you can place tuning correctly on the decision ladder, weigh its ongoing maintenance cost, and tell the difference between a problem that tuning fixes and a problem where tuning is expensive overkill.

白話文解釋（Plain English Explanation）

Tuning sounds intimidating because it involves the word "training", which makes people picture data scientists and GPUs. But the business idea behind tuning is something every manager already understands: when general capability is not enough, you invest in specialized capability — and that investment comes with an upfront cost and an ongoing upkeep cost. The following analogies make the trade-offs concrete and ground them in how Vertex AI tuning actually works.

Analogy 1 — A Generalist Employee Sent for Specialist Training

Imagine you hire a bright, capable new graduate. On day one they are a generalist: articulate, quick to learn, good at almost any task you describe clearly. This is the foundation model, Gemini, straight out of the box. For most work, you simply give them clear instructions — that is prompting. When they need company facts, you hand them the internal wiki to read before answering — that is grounding / RAG.

But suppose your firm does highly specialized tax-advisory work with a very particular house style for client memos. You could write a ten-page instruction sheet for every memo, but that is exhausting and inconsistent. Instead you enroll the employee in a multi-week specialist training course, working through hundreds of past memos until the house style becomes second nature. That course is fine-tuning on Vertex AI: you provide the curated dataset of input/output examples, Google runs the supervised training, and you receive an employee who now produces correct-style memos without the ten-page brief every time.

The analogy also captures the catch. The training course costs money and time upfront. And if tax law changes next year, you must send the employee back for refresher training — the tuned model does not update itself. On the exam, this is the key business insight: tuning buys you consistency and reduced per-task effort, but it converts a one-time prompt-writing chore into an ongoing training-and-retraining commitment.

Analogy 2 — Teaching a Fluent Speaker Your Local Accent and Slang

Picture a language interpreter who speaks textbook-perfect Mandarin. They are fluent, grammatical, and professional — that is the foundation model. But your company operates in a specific Taiwanese industry with its own slang, its own product nicknames, and a casual, friendly tone that customers expect. The textbook speaker sounds correct but slightly off — too formal, missing the local flavor.

You cannot fix an accent by handing someone a glossary mid-conversation; that is the limit of prompting. Instead, you have the interpreter spend time immersed in recordings of your actual customer conversations until they naturally pick up the cadence, the slang, and the warmth. That immersion is fine-tuning: feeding Vertex AI hundreds of real example exchanges so the tuned Gemini model absorbs not just facts but manner — tone, phrasing, formatting habits.

This analogy highlights what tuning is uniquely good at: teaching style and form rather than knowledge. If the problem is "the model gives correct answers but in the wrong voice or shape", tuning is a strong fit. If the problem is "the model lacks current facts", tuning is the wrong tool — you want grounding instead, because re-teaching an accent every time a fact changes is absurdly wasteful. The exam loves this distinction: tuning shapes how a model speaks; grounding changes what it knows.

Analogy 3 — A Tailored Suit Versus Off-the-Rack Clothing

Walk into any department store and you can buy an off-the-rack suit. It fits most people acceptably — that is a foundation model with a good prompt: fast, cheap, available immediately, no commitment. For the large majority of occasions, off-the-rack is exactly the right choice, and spending more would be a waste.

But a company executive who appears on stage hundreds of times a year, always needing to look impeccably on-brand, eventually visits a tailor. The tailor measures precisely and produces a bespoke suit that fits perfectly every single time, with no fussing. That bespoke suit is a fine-tuned model on Vertex AI: a higher upfront cost and a wait while it is made, in exchange for a perfect, repeatable fit for a high-volume, high-stakes need.

The trade-off is exactly the tailoring trade-off. Bespoke costs more and takes longer. If your body shape changes — if the requirements shift — the suit must be re-tailored or remade, just as a tuned model must be retrained when the domain evolves. So you only commission a tailored suit when you wear it constantly and the perfect fit genuinely matters. For the exam: tuning is justified by high volume plus high consistency requirements; for occasional or experimental use, off-the-rack (prompting) is the smarter, cheaper call. You can connect this thinking to ROI in measuring GenAI business value.

The Decision Ladder: Try Cheaper Levers First

The exam's expected order of escalation is fixed: prompt engineering → few-shot examples → grounding / RAG → fine-tuning. Tuning is the last lever, not the first — it costs the most, needs a curated training dataset, and creates an artifact you must maintain and re-tune as the base model evolves. When a scenario can be solved with a better prompt or by grounding the model in company data, that is the right answer; "fine-tune the model" is wrong unless the cheaper levers have genuinely been exhausted.

The single most exam-relevant concept in this topic is the decision ladder. When a foundation model is not performing well enough, you should climb the ladder one rung at a time, because each rung is cheaper, faster, and lower-maintenance than the one above it.

Prompt engineering. Rewrite the instruction: be specific, give context, define the output format, assign a role. This costs nothing and changes in seconds.
Few-shot prompting. Include two to five examples of the input/output you want directly inside the prompt. The model imitates the pattern. Still no training, still instant to change.
Grounding / RAG. Connect the model to your private, current data so its answers are factually anchored. This fixes knowledge and freshness gaps.
Fine-tuning. Train the model on a curated dataset so a behavior becomes built-in. This is the most powerful but also the most expensive and highest-maintenance rung.

The discipline the exam wants to see is starting low. A surprising number of problems that look like they need tuning are actually solved by a better prompt or by adding a few examples. Tuning should be a considered decision made after the cheaper rungs have genuinely been tried and found insufficient — not a reflex.

Tuning is not the first lever — and on the exam it is rarely the right first answer. A very common misread is to jump straight to fine-tuning the moment a model underperforms. Before tuning, you must exhaust the cheaper rungs: improve the prompt, add few-shot examples, and apply grounding / RAG. Tuning has real upfront training cost, requires a curated labeled dataset, and creates an ongoing retraining obligation. If a scenario describes a problem that better instructions or a handful of examples could fix, the correct answer is prompt engineering or few-shot prompting, not tuning. Likewise, if the problem is missing or stale facts, the answer is grounding, not tuning. See https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompt-best-practices.

What Fine-Tuning Actually Changes

It helps business leaders to know, at a conceptual level, what tuning does and does not do.

What Tuning Is Good At

Tuning excels at teaching a model a consistent behavior — something you want it to do the same way every single time:

Style and voice. A consistent brand tone: friendly, formal, concise, on-message.
Output format. Always returning a specific structure — a particular memo layout, a fixed JSON shape, a standard report template.
Domain language. Fluency in specialized jargon, abbreviations, and conventions of a narrow field such as medical coding, legal drafting, or semiconductor manufacturing.
Task specialization. Performing one narrow task — classifying support tickets into your exact categories, for example — more reliably than a general prompt achieves.

What Tuning Is Not Good At

Tuning is the wrong tool for several common needs:

Adding fresh or changing facts. Tuning bakes behavior in at training time. Today's stock price, this week's inventory, the latest policy — those belong in grounding / RAG, covered in grounding and RAG.
One-off or experimental needs. If you only need a behavior occasionally, a good prompt is far cheaper.
Problems a better prompt already solves. If you have not seriously tried prompt engineering, you have not earned the right to tune.

Tuning and Grounding Are Complementary

A subtle but important point for the exam: tuning and grounding are not rivals — they are partners. A mature production system often fine-tunes a model for the right tone and format and grounds it on a live knowledge base for the right facts. Tuning shapes the manner; grounding supplies the matter. You can read more about combining factual grounding with retrieval in grounding and RAG.

Supervised fine-tuning (SFT) is the process of continuing the training of a pre-trained foundation model on a curated dataset of input/output example pairs, so the model learns to reproduce the desired style, format, or task behavior. The "supervised" part means every training example includes the correct answer, which the model is taught to imitate. On Vertex AI, supervised fine-tuning for Gemini is a fully managed service: you provide the labeled dataset, and Google runs the training and hosts the resulting private tuned model. See https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune_gemini/text_tune.

Supervised Fine-Tuning in Practice

Supervised fine-tuning is the most common and most exam-relevant form of tuning. The recipe, at a leadership level, is straightforward:

Collect examples. Gather a curated dataset of input/output pairs — the prompt a user might send, paired with the ideal response you want the model to produce.
Curate and check quality. The dataset must be clean, consistent, and genuinely representative. A tuned model is only as good as its examples; inconsistent examples teach inconsistent behavior.
Run the tuning job. Submit the dataset to Vertex AI. Google handles the training infrastructure — there is no cluster for you to manage.
Evaluate the tuned model. Compare the tuned model against the base model on held-out examples to confirm it actually improved.
Deploy and use it. The tuned model becomes a private endpoint that only your organization can call.

The headline numbers a leader should internalize: supervised fine-tuning typically needs somewhere from a few hundred to a few thousand high-quality examples — far more than the handful you would put in a few-shot prompt, but far fewer than the billions used to build the foundation model from scratch. The dataset is the real project. Most of the effort, time, and risk in a tuning initiative lives in assembling and curating that example set, not in clicking "train".

Parameter-Efficient Tuning (Adapters and LoRA)

A natural leadership question is: "Doesn't retraining a giant model cost a fortune?" In the early days, full fine-tuning — adjusting every parameter in the model — was indeed slow and expensive. Modern tuning, including the tuning offered on Vertex AI, uses a far smarter approach called parameter-efficient tuning.

The Sticky-Note Idea

The intuition: instead of rewriting the entire reference book, you add a thin stack of sticky notes to it. The original book — the foundation model's billions of parameters — stays frozen and untouched. Training only adjusts a small, lightweight set of new parameters layered on top. Those add-on parameters are commonly called adapters, and a popular technique for producing them is LoRA (Low-Rank Adaptation).

Why This Matters to the Business

Parameter-efficient tuning changes the economics in three concrete ways:

Cheaper and faster. Training touches a tiny fraction of the model, so jobs finish faster and cost dramatically less than full retraining.
Smaller artifacts. A LoRA adapter is small. You can maintain several adapters — one for legal tone, one for marketing tone, one for support triage — all riding on the same shared frozen base model, instead of storing many full model copies.
Lower risk. Because the base model is frozen, parameter-efficient tuning is less likely to damage the model's broad general capabilities while teaching it the new specialty.

For the Generative AI Leader exam you do not need the mathematics of LoRA. You need the business takeaway: modern tuning is parameter-efficient, meaning it is far cheaper and lower-risk than people assume, because it adds a small specialized layer rather than rebuilding the whole model.

Distillation: A Smaller, Cheaper Specialist

Distillation is a second tuning-family technique with a clear business payoff. The idea: use a large, powerful, expensive "teacher" model to train a smaller, faster, cheaper "student" model to perform a specific task almost as well.

The everyday analogy is a master craftsman teaching an apprentice one focused skill. The master knows everything and is in high demand, so consulting the master for every routine job is slow and costly. Instead, the master trains an apprentice intensively on one narrow task. Afterward, the apprentice handles that task quickly and cheaply, and the master is reserved for the genuinely hard cases.

In Vertex AI terms, a large model such as a top-tier Gemini variant acts as the teacher, generating high-quality example outputs, and a smaller model is tuned as the student to imitate them. The business benefits:

Lower inference cost. A smaller model costs less per request — and at high volume, per-request cost dominates the total bill.
Lower latency. Smaller models respond faster, which matters for interactive applications.
Specialized quality. On the narrow task it was distilled for, the student can rival the much larger teacher.

The trade-off: the student is a specialist, not a generalist. It is excellent at the one task it was distilled for and weaker elsewhere. Distillation is the right answer when a scenario describes a high-volume, well-defined, repetitive task where per-request cost and speed are the dominant concerns.

For the Generative AI Leader exam, fix the four types of tuning-related techniques and what each one is for. Supervised fine-tuning teaches a model a style, format, or domain behavior from curated input/output examples. Parameter-efficient tuning (adapters / LoRA) is the modern, low-cost way to perform that tuning by training a small add-on layer while freezing the base model. Distillation transfers a capability from a large teacher model into a smaller, cheaper, faster student model for high-volume tasks. And grounding / RAG — which is not tuning — supplies fresh facts at request time. Tuning changes the model's behavior; grounding changes the model's inputs. See https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-models.

Cost, Data, and Maintenance Trade-Offs

Tuning is best understood by a leader as an investment with both a capital cost and an operating cost. There are three trade-off dimensions to weigh.

The Cost Dimension

Tuning has costs that prompting does not:

One-time training cost. Running the tuning job on Vertex AI consumes compute. Parameter-efficient tuning keeps this modest, but it is never zero.
Data preparation cost. Assembling and curating hundreds to thousands of quality examples takes skilled human effort — often the largest line item.
Ongoing inference cost. A tuned model is billed per use, like any model. A distilled smaller model can make this cheaper; a tuned larger model will not be free.

The Data-Requirement Dimension

Prompting needs zero examples or a handful. Grounding needs a maintained knowledge base. Tuning needs a substantial, curated, labeled dataset — and crucially, the data must be high quality. Garbage examples produce a garbage tuned model. If an organization cannot assemble a clean dataset, it is not ready to tune.

The Maintenance Dimension

This is the most overlooked dimension and a favorite exam theme. A prompt can be edited in seconds. A tuned model is frozen at training time — it does not learn new things on its own. When your style guide changes, when your domain evolves, when a new product launches, you must gather fresh examples and re-run the tuning job. Every tuned model is therefore a standing maintenance commitment, not a one-and-done deliverable.

When evaluating whether to tune, lead the conversation with a simple framing: tuning is like hiring and training a specialist employee. There is recruitment effort (curating the dataset), a training course (the Vertex AI tuning job), and ongoing professional development (periodic retraining as the domain shifts). If a business need is large, durable, and stable enough to justify a "permanent specialist hire", tuning is worth it. If the need is small, occasional, or fast-changing, stay with prompting and grounding — the equivalent of briefing a capable generalist for each task. This framing maps the technical decision onto a budgeting decision every executive already understands. See https://cloud.google.com/vertex-ai/generative-ai/docs/overview.

When Tuning Is Worth It — and When It Is Overkill

The exam will present scenarios and ask you to judge whether tuning is justified. Here are the patterns.

Tuning Is Worth It When...

You need a consistent brand voice at scale. A company generating thousands of customer messages a day, all of which must sound unmistakably on-brand, gets real value from baking the voice into the model rather than re-specifying it in every prompt.
You operate in a specialized domain. A field with dense jargon and conventions — clinical notes, legal contracts, insurance underwriting — where a general model is competent but not fluent, benefits from domain fine-tuning.
You need a strict, repeatable output format. When downstream systems depend on a precise structure every time, a tuned model is more reliable than prompt instructions alone.
The volume is high and the task is stable. High request volume amortizes the upfront cost; a stable task keeps the retraining bill low. Distillation is especially attractive here.
Prompts have become unwieldy. When your prompt has grown into a multi-page instruction document just to get acceptable behavior, tuning can move that burden into the model and simplify operations.

Tuning Is Overkill When...

A better prompt would solve it. If you have not seriously iterated on prompt design, tuning is premature. See prompt optimization techniques.
The need is about facts, not behavior. Fresh, changing, or company-specific facts belong in grounding, not tuning.
The use case is experimental or low-volume. A pilot, a proof of concept, or an occasional task does not justify the upfront and maintenance cost.
The requirements change frequently. A fast-moving target means constant retraining — expensive and fragile. Prompting and grounding adapt instantly.
You lack a quality dataset. No clean, curated examples means no good tuned model. The right first step is building the dataset, not launching a tuning job.

Memorize the decision ladder in order — it is the backbone of this topic on the Generative AI Leader exam: (1) Prompt engineering → (2) Few-shot prompting → (3) Grounding / RAG → (4) Fine-tuning. Always climb from the bottom: each rung is cheaper, faster, and lower-maintenance than the one above. Tuning is the last resort, justified only by a need that is high-volume, stable, and consistency-critical — such as a permanent brand voice or a specialized domain. Also memorize the split of responsibilities: tuning changes the model's behavior (style, format, domain); grounding changes the model's knowledge (facts). See https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-models.

Vertex AI: Where Tuning Happens

On Google Cloud, Vertex AI is the platform where model tuning takes place. For a Generative AI Leader, the points worth knowing are about capability and responsibility, not configuration.

Tuning Is a Managed Service

Vertex AI runs tuning as a fully managed service. You provide the curated example dataset; Google provisions the training infrastructure, runs the job, and hosts the result. There is no GPU cluster for your team to build or babysit. This lowers the barrier dramatically — an organization does not need a deep ML-infrastructure team to tune a model.

What You Get Back

A tuning job produces a private, tuned version of the model, accessible only to your organization through its own endpoint. Your tuned Gemini does not leak to other Vertex AI customers, and Google does not use your tuning data to train its base models. This data-governance assurance matters to regulated industries and is a frequent exam talking point.

Tuning Sits Alongside the Other Levers

Vertex AI is also where you do prompt design, manage grounding and retrieval, evaluate models, and monitor them in production. Because all four rungs of the decision ladder live on one platform, a team can start with prompting, add grounding, and graduate to tuning without changing vendors or rebuilding their stack — a continuity worth highlighting to executives.

The Leader's Role

You will not run the tuning pipeline. Your role is to decide whether tuning is justified, budget for both the upfront and the ongoing maintenance cost, ensure a quality dataset can be assembled, and set expectations that a tuned model is a living asset requiring periodic retraining. Tuning is an organizational commitment as much as a technical one.

A Worked Business Example

Consider a Taiwanese e-commerce company running customer support with Gemini. The team starts at rung one: a good prompt. Replies are accurate but sound too generic and corporate — not the warm, friendly tone the brand is known for. They climb to rung two and add few-shot examples; tone improves, but the prompt is now long and replies are still inconsistent across thousands of daily messages.

They consider rung three, grounding, and indeed adopt it — connecting the model to the live product catalog and policy base so answers stay factually current. That fixes accuracy, but not voice.

Now the decision is genuinely earned. The need is high-volume (thousands of messages daily), stable (the brand voice rarely changes), and consistency-critical (every reply must sound on-brand). So the team curates roughly a thousand exemplary past support replies and runs a supervised fine-tuning job on Vertex AI. The tuned model now produces on-brand replies naturally, the prompt shrinks back to a short instruction, and consistency rises. To control cost at this volume they also explore distillation into a smaller student model.

Critically, leadership budgets for quarterly retraining to absorb new products and policy shifts. The tuned model is treated as a maintained asset, and its value is tracked using the methods in measuring GenAI business value. This is the full ladder, climbed in order, with tuning chosen deliberately — exactly the judgment the exam rewards.

Frequently Asked Questions

Q: What is the difference between fine-tuning and prompt engineering?

A: Prompt engineering changes the instructions you send to a model at request time — it is instant, free, and requires no training. Fine-tuning changes the model itself by continuing its training on a curated dataset of examples, so a behavior becomes built-in. Prompting is the first rung of the decision ladder; tuning is the last. You should always exhaust prompting (and few-shot prompting and grounding) before considering tuning, because tuning carries upfront training cost, needs a labeled dataset, and creates an ongoing retraining obligation.

Q: When should I fine-tune instead of using grounding / RAG?

A: Use the test "behavior versus knowledge". Fine-tune when you need to change how the model behaves — its style, tone, output format, or domain fluency. Use grounding / RAG when you need to change what the model knows — supplying fresh, changing, or company-specific facts. Tuning bakes behavior in at training time and cannot keep up with changing facts; grounding injects current data at request time but does not reshape voice or format. Mature systems often use both together: tuning for manner, grounding for matter.

Q: How much data do I need to fine-tune a model?

A: Far less than building a foundation model, far more than a few-shot prompt. Supervised fine-tuning on Vertex AI typically needs a few hundred to a few thousand high-quality input/output example pairs. The exact number depends on how complex the target behavior is, but quality matters more than quantity — inconsistent or low-quality examples teach inconsistent behavior. Assembling and curating that dataset is usually the largest single effort in a tuning project.

Q: What is parameter-efficient tuning, and why does it matter to a business leader?

A: Older "full" fine-tuning adjusted every parameter in a model, which was slow and expensive. Parameter-efficient tuning — using techniques such as LoRA (Low-Rank Adaptation) and adapters — freezes the large base model and trains only a small add-on layer. The business takeaway: modern tuning is far cheaper, faster, and lower-risk than people assume, because it adds a thin specialized layer rather than rebuilding the whole model. You can also keep several small adapters for different purposes on top of one shared base model.

Q: What is model distillation?

A: Distillation uses a large, powerful "teacher" model to train a smaller, faster, cheaper "student" model to perform a specific task almost as well. The student is a specialist — excellent at its one task, weaker elsewhere. The payoff is lower per-request cost and lower latency, which matters most for high-volume, well-defined, repetitive tasks. On Vertex AI a large Gemini variant can act as the teacher and a smaller model as the distilled student.

Q: Is a fine-tuned model a one-time project or an ongoing commitment?

A: It is an ongoing commitment. A tuned model is frozen at training time and does not learn new things on its own. When your brand voice, domain conventions, or product line change, you must gather fresh examples and re-run the tuning job. Treat a tuned model like a specialist employee who needs periodic professional development — budget for retraining from the start, and track its value continuously rather than assuming it stays optimal forever.

Q: Do I need a data science team to tune a model on Vertex AI?

A: Not a large infrastructure team. Vertex AI runs tuning as a fully managed service — you supply the curated dataset and Google handles the training compute and hosting. What you do need is the ability to assemble a clean, high-quality, representative example dataset, plus the organizational discipline to maintain and periodically retrain the model. The hard part of tuning is the data and the maintenance commitment, not the infrastructure.

Summary: Model Tuning for the Generative AI Leader

For the Generative AI Leader exam, treat tuning as a strategic investment decision. Internalize the decision ladder — prompt engineering, few-shot prompting, grounding / RAG, then fine-tuning — and always climb from the cheapest rung up. Know that supervised fine-tuning teaches a model style, format, and domain behavior; that parameter-efficient tuning (LoRA / adapters) makes that affordable and low-risk; and that distillation produces a smaller, cheaper specialist for high-volume tasks. Remember the clean split: tuning changes the model's behavior, grounding changes the model's knowledge. Tuning is worth it for high-volume, stable, consistency-critical needs like a permanent brand voice or a specialized domain — and it is overkill for problems a better prompt solves, for fast-changing requirements, or where no quality dataset exists. Finally, remember that Vertex AI is where tuning happens as a managed service, and that every tuned model is a living asset carrying both an upfront cost and an ongoing maintenance commitment.