What I Figured Out About Writing Prompts for Google Gemini
Understanding Gemini's thinking levels, multimodality, and Search Grounding for better prompt results.
-0031.png&w=3840&q=75)
Gemini is a different beast. I spent some time thinking it works like ChatGPT, and kept getting weird results. It doesn't. It has its own logic, and once you get it, things click.
The latest version - Gemini 3 Pro Preview from December 2025 - introduced this "thinking levels" system. Basically, you can tell the model how hard to think. Sounds gimmicky, but it's actually useful.
Here's what I learned works.
What Makes Gemini Different
A few things that threw me off at first:
Thinking Levels instead of thinking budget - you pick a level (minimal, low, medium, high) and the model adjusts how deeply it reasons. More on this below.
Native multimodality - Gemini actually understands images, video, and audio at a deep level. Not just "describe this image" but actually working with the content.
Search Grounding - it can check facts through Google Search before answering. Useful for anything with current data.
Media resolution settings - you can optimize how many tokens different media types consume.
The Thinking Levels Thing
This is actually pretty clever. Instead of some abstract "thinking budget" number, you just pick a level:
| Level | When I Use It |
|---|---|
| minimal | Quick chat stuff, simple questions |
| low | Basic formatting, simple instructions |
| medium | Most everyday tasks |
| high | Complex analysis, math, reasoning |
The default is "high" which is often overkill. For simple stuff, use "minimal" or "low" - faster responses, fewer tokens burned.
How I Structure Prompts for Gemini
Google says XML or Markdown both work. I've found XML works better for complex stuff:
<role>
You are a helpful assistant specializing in [domain].
</role>
<constraints>
1. Be objective
2. Cite sources
3. [Other rules]
</constraints>
<context>
[User data goes here]
</context>
<task>
[What you actually want]
</task>
The key insight: Gemini treats stuff inside <context> as data to analyze, not instructions to follow. This matters for security and for getting better results when you're feeding it user content.
Settings That Tripped Me Up
Don't Touch the Temperature
Google recommends keeping temperature at 1.0. I tried lowering it once thinking it would make responses more consistent. Bad idea. The model started looping and math tasks got worse. Just leave it at 1.0.
Media Resolution Matters
| Content Type | Best Setting | Tokens Used |
|---|---|---|
| Images | media_resolution_high | ~1120 tokens |
| PDFs | media_resolution_medium | ~560 tokens |
| Video | media_resolution_low | ~70 tokens/frame |
For PDFs, don't use high - medium is the sweet spot. Video eats a lot of tokens so plan accordingly.
Examples That Work for Me
Analyzing an Image
<role>
You are a professional e-commerce product photographer and marketing expert.
</role>
<task>
Analyze this product photo and provide:
1. Three specific improvements for the lighting
2. Composition suggestions for better conversion
3. Background recommendations
</task>
<output_format>
Structure your response with clear headers for each section.
Be specific and actionable.
</output_format>
Working with Documents
<role>
You are a legal document analyst with expertise in contract review.
</role>
<constraints>
- Extract only factual information from the document
- Do not add interpretations or legal advice
- Quote relevant passages when possible
</constraints>
<context>
[Attached: contract.pdf]
</context>
<task>
Extract and summarize:
1. Key terms and conditions
2. Important dates and deadlines
3. Financial obligations
4. Termination clauses
</task>
Video Analysis
<role>
You are a video content analyst.
</role>
<task>
Watch this video and provide:
1. Summary of main points (with timestamps)
2. Key quotes from speakers
3. Visual elements worth noting
</task>
<context>
[Attached: presentation.mp4]
</context>
Search Grounding Is Actually Cool
Gemini Pro can automatically verify facts through Google Search. Useful for:
- Content that needs current data
- Checking statistics before including them
- Infographics with real numbers
Example:
<task>
Generate an infographic about the current GDP of G7 countries
with accurate and up-to-date data visualization.
</task>
The model checks actual current data before generating. Pretty neat when you need accurate numbers.
How Gemini Compares
| Aspect | Gemini 3 | ChatGPT | Claude |
|---|---|---|---|
| Prompt format | XML or Markdown | Markdown + XML | XML (recommended) |
| Reasoning | Thinking levels | reasoning_effort | Extended Thinking |
| Chain-of-Thought | Built-in | Needs explicit instruction | Built-in |
| Multimodality | Native, advanced | Good | Basic |
| Fact checking | Search Grounding | Web Search | Web Search |
Mistakes to Avoid
Changing temperature - just leave it at 1.0
Ignoring thinking levels - use minimal for simple stuff, high for complex
Wrong media resolution - for PDFs use medium, not high
Mixing data and instructions - always wrap user data in <context>
Long prompts for video - remember video consumes lots of tokens
What I Do Now
- Use XML for anything complex
- Separate role, constraints, context, and task clearly
- For multimodal tasks, specify what to focus on: "Focus on the text in the image" or "Analyze the speaker's body language"
- Use Search Grounding for current data
- Test different thinking levels - find the balance between quality and speed
The Takeaway
Gemini is powerful, especially for multimodal stuff. Key things:
- Pick the right thinking level
- Separate instructions from data with XML tags
- Keep temperature at 1.0
- Optimize media resolution for different content types