Gemini•Dec 26, 2025•8 min read

What I Figured Out About Writing Prompts for Google Gemini

Understanding Gemini's thinking levels, multimodality, and Search Grounding for better prompt results.

Gemini is a different beast. I spent some time thinking it works like ChatGPT, and kept getting weird results. It doesn't. It has its own logic, and once you get it, things click.

The latest version - Gemini 3 Pro Preview from December 2025 - introduced this "thinking levels" system. Basically, you can tell the model how hard to think. Sounds gimmicky, but it's actually useful.

Here's what I learned works.

What Makes Gemini Different

A few things that threw me off at first:

Thinking Levels instead of thinking budget - you pick a level (minimal, low, medium, high) and the model adjusts how deeply it reasons. More on this below.

Native multimodality - Gemini actually understands images, video, and audio at a deep level. Not just "describe this image" but actually working with the content.

Search Grounding - it can check facts through Google Search before answering. Useful for anything with current data.

Media resolution settings - you can optimize how many tokens different media types consume.

The Thinking Levels Thing

This is actually pretty clever. Instead of some abstract "thinking budget" number, you just pick a level:

Level	When I Use It
minimal	Quick chat stuff, simple questions
low	Basic formatting, simple instructions
medium	Most everyday tasks
high	Complex analysis, math, reasoning

The default is "high" which is often overkill. For simple stuff, use "minimal" or "low" - faster responses, fewer tokens burned.

How I Structure Prompts for Gemini

Google says XML or Markdown both work. I've found XML works better for complex stuff:

<role>
You are a helpful assistant specializing in [domain].
</role>

<constraints>
1. Be objective
2. Cite sources
3. [Other rules]
</constraints>

<context>
[User data goes here]
</context>

<task>
[What you actually want]
</task>

The key insight: Gemini treats stuff inside <context> as data to analyze, not instructions to follow. This matters for security and for getting better results when you're feeding it user content.

Settings That Tripped Me Up

Don't Touch the Temperature

Google recommends keeping temperature at 1.0. I tried lowering it once thinking it would make responses more consistent. Bad idea. The model started looping and math tasks got worse. Just leave it at 1.0.

Media Resolution Matters

Content Type	Best Setting	Tokens Used
Images	media_resolution_high	~1120 tokens
PDFs	media_resolution_medium	~560 tokens
Video	media_resolution_low	~70 tokens/frame

For PDFs, don't use high - medium is the sweet spot. Video eats a lot of tokens so plan accordingly.

Examples That Work for Me

Analyzing an Image

<role>
You are a professional e-commerce product photographer and marketing expert.
</role>

<task>
Analyze this product photo and provide:
1. Three specific improvements for the lighting
2. Composition suggestions for better conversion
3. Background recommendations
</task>

<output_format>
Structure your response with clear headers for each section.
Be specific and actionable.
</output_format>

Working with Documents

<role>
You are a legal document analyst with expertise in contract review.
</role>

<constraints>
- Extract only factual information from the document
- Do not add interpretations or legal advice
- Quote relevant passages when possible
</constraints>

<context>
[Attached: contract.pdf]
</context>

<task>
Extract and summarize:
1. Key terms and conditions
2. Important dates and deadlines
3. Financial obligations
4. Termination clauses
</task>

Video Analysis

<role>
You are a video content analyst.
</role>

<task>
Watch this video and provide:
1. Summary of main points (with timestamps)
2. Key quotes from speakers
3. Visual elements worth noting
</task>

<context>
[Attached: presentation.mp4]
</context>

Search Grounding Is Actually Cool

Gemini Pro can automatically verify facts through Google Search. Useful for:

Content that needs current data
Checking statistics before including them
Infographics with real numbers

Example:

<task>
Generate an infographic about the current GDP of G7 countries
with accurate and up-to-date data visualization.
</task>

The model checks actual current data before generating. Pretty neat when you need accurate numbers.

How Gemini Compares

Aspect	Gemini 3	ChatGPT	Claude
Prompt format	XML or Markdown	Markdown + XML	XML (recommended)
Reasoning	Thinking levels	reasoning_effort	Extended Thinking
Chain-of-Thought	Built-in	Needs explicit instruction	Built-in
Multimodality	Native, advanced	Good	Basic
Fact checking	Search Grounding	Web Search	Web Search

Mistakes to Avoid

Changing temperature - just leave it at 1.0

Ignoring thinking levels - use minimal for simple stuff, high for complex

Wrong media resolution - for PDFs use medium, not high

Mixing data and instructions - always wrap user data in <context>

Long prompts for video - remember video consumes lots of tokens

What I Do Now

Use XML for anything complex
Separate role, constraints, context, and task clearly
For multimodal tasks, specify what to focus on: "Focus on the text in the image" or "Analyze the speaker's body language"
Use Search Grounding for current data
Test different thinking levels - find the balance between quality and speed

The Takeaway

Gemini is powerful, especially for multimodal stuff. Key things:

Pick the right thinking level
Separate instructions from data with XML tags
Keep temperature at 1.0
Optimize media resolution for different content types