Back to blog
GeminiDec 26, 20258 min read

What I Figured Out About Writing Prompts for Google Gemini

Understanding Gemini's thinking levels, multimodality, and Search Grounding for better prompt results.

What I Figured Out About Writing Prompts for Google Gemini

Gemini is a different beast. I spent some time thinking it works like ChatGPT, and kept getting weird results. It doesn't. It has its own logic, and once you get it, things click.

The latest version - Gemini 3 Pro Preview from December 2025 - introduced this "thinking levels" system. Basically, you can tell the model how hard to think. Sounds gimmicky, but it's actually useful.

Here's what I learned works.


What Makes Gemini Different

A few things that threw me off at first:

Thinking Levels instead of thinking budget - you pick a level (minimal, low, medium, high) and the model adjusts how deeply it reasons. More on this below.

Native multimodality - Gemini actually understands images, video, and audio at a deep level. Not just "describe this image" but actually working with the content.

Search Grounding - it can check facts through Google Search before answering. Useful for anything with current data.

Media resolution settings - you can optimize how many tokens different media types consume.


The Thinking Levels Thing

This is actually pretty clever. Instead of some abstract "thinking budget" number, you just pick a level:

Level When I Use It
minimal Quick chat stuff, simple questions
low Basic formatting, simple instructions
medium Most everyday tasks
high Complex analysis, math, reasoning

The default is "high" which is often overkill. For simple stuff, use "minimal" or "low" - faster responses, fewer tokens burned.


How I Structure Prompts for Gemini

Google says XML or Markdown both work. I've found XML works better for complex stuff:

<role>
You are a helpful assistant specializing in [domain].
</role>

<constraints>
1. Be objective
2. Cite sources
3. [Other rules]
</constraints>

<context>
[User data goes here]
</context>

<task>
[What you actually want]
</task>

The key insight: Gemini treats stuff inside <context> as data to analyze, not instructions to follow. This matters for security and for getting better results when you're feeding it user content.


Settings That Tripped Me Up

Don't Touch the Temperature

Google recommends keeping temperature at 1.0. I tried lowering it once thinking it would make responses more consistent. Bad idea. The model started looping and math tasks got worse. Just leave it at 1.0.

Media Resolution Matters

Content Type Best Setting Tokens Used
Images media_resolution_high ~1120 tokens
PDFs media_resolution_medium ~560 tokens
Video media_resolution_low ~70 tokens/frame

For PDFs, don't use high - medium is the sweet spot. Video eats a lot of tokens so plan accordingly.


Examples That Work for Me

Analyzing an Image

<role>
You are a professional e-commerce product photographer and marketing expert.
</role>

<task>
Analyze this product photo and provide:
1. Three specific improvements for the lighting
2. Composition suggestions for better conversion
3. Background recommendations
</task>

<output_format>
Structure your response with clear headers for each section.
Be specific and actionable.
</output_format>

Working with Documents

<role>
You are a legal document analyst with expertise in contract review.
</role>

<constraints>
- Extract only factual information from the document
- Do not add interpretations or legal advice
- Quote relevant passages when possible
</constraints>

<context>
[Attached: contract.pdf]
</context>

<task>
Extract and summarize:
1. Key terms and conditions
2. Important dates and deadlines
3. Financial obligations
4. Termination clauses
</task>

Video Analysis

<role>
You are a video content analyst.
</role>

<task>
Watch this video and provide:
1. Summary of main points (with timestamps)
2. Key quotes from speakers
3. Visual elements worth noting
</task>

<context>
[Attached: presentation.mp4]
</context>

Search Grounding Is Actually Cool

Gemini Pro can automatically verify facts through Google Search. Useful for:

  • Content that needs current data
  • Checking statistics before including them
  • Infographics with real numbers

Example:

<task>
Generate an infographic about the current GDP of G7 countries
with accurate and up-to-date data visualization.
</task>

The model checks actual current data before generating. Pretty neat when you need accurate numbers.


How Gemini Compares

Aspect Gemini 3 ChatGPT Claude
Prompt format XML or Markdown Markdown + XML XML (recommended)
Reasoning Thinking levels reasoning_effort Extended Thinking
Chain-of-Thought Built-in Needs explicit instruction Built-in
Multimodality Native, advanced Good Basic
Fact checking Search Grounding Web Search Web Search

Mistakes to Avoid

Changing temperature - just leave it at 1.0

Ignoring thinking levels - use minimal for simple stuff, high for complex

Wrong media resolution - for PDFs use medium, not high

Mixing data and instructions - always wrap user data in <context>

Long prompts for video - remember video consumes lots of tokens


What I Do Now

  1. Use XML for anything complex
  2. Separate role, constraints, context, and task clearly
  3. For multimodal tasks, specify what to focus on: "Focus on the text in the image" or "Analyze the speaker's body language"
  4. Use Search Grounding for current data
  5. Test different thinking levels - find the balance between quality and speed

The Takeaway

Gemini is powerful, especially for multimodal stuff. Key things:

  • Pick the right thinking level
  • Separate instructions from data with XML tags
  • Keep temperature at 1.0
  • Optimize media resolution for different content types

Want to improve your prompts instantly?