Update chat_template.jinja
#2
by qgallouedec HF Staff - opened
Making the Gemma4 chat template prefix-preserving
Problem
The new template inlines tool responses into the model turn via a forward-scan. This means appending a role:tool message changes tokens before the turn-end marker, breaking prefix-preservation:
# Without tool response → turn closes immediately:
...model\n<|tool_call>call:func{}<tool_call|><turn|>\n
# With tool response → new content inserted before <turn|>:
...model\n<|tool_call>call:func{}<tool_call|><|tool_response>...<tool_response|>...
Fix (3 changes)
1. Handle role:tool messages in the main loop — render them as standalone <|tool_response> blocks after the model turn closes, instead of inlining them:
{#- Loop through messages -#}
{%- for message in loop_messages -%}
- {%- if message['role'] != 'tool' -%}
+ {%- if message['role'] == 'tool' -%}
+ {#- Render tool responses as standalone blocks (outside model turn) for prefix-preservation -#}
+ {%- set tool_name = message.get('name') | default('unknown') -%}
+ {%- set tool_body = message.get('content') -%}
+ {%- if tool_body is string -%}
+ {{- format_tool_response_block(tool_name, tool_body) -}}
+ {%- elif tool_body is sequence and tool_body is not string -%}
+ {%- set ns_txt = namespace(s='') -%}
+ {%- for part in tool_body -%}
+ {%- if part.get('type') == 'text' -%}
+ {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
+ {%- endif -%}
+ {%- endfor -%}
+ {{- format_tool_response_block(tool_name, ns_txt.s) -}}
+ {%- else -%}
+ {{- format_tool_response_block(tool_name, tool_body) -}}
+ {%- endif -%}
+ {%- else -%}
2. Include tool messages in the previous-message scan — so an assistant message after a tool opens a new <|turn>model instead of continuing the previous model turn:
{%- if loop.index0 > 0 -%}
{%- for j in range(loop.index0 - 1, -1, -1) -%}
{%- if not prev_nt.found -%}
- {%- if loop_messages[j]['role'] != 'tool' -%}
- {%- set prev_nt.role = loop_messages[j]['role'] -%}
- {%- set prev_nt.found = true -%}
- {%- endif -%}
+ {%- set prev_nt.role = loop_messages[j]['role'] -%}
+ {%- set prev_nt.found = true -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
3. Remove the forward-scan that inlined tool responses into the model turn:
{%- if message.get('tool_responses') -%}
{#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
...
- {%- elif message.get('tool_calls') -%}
- {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
- {%- set ns_tool_scan = namespace(stopped=false) -%}
- {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
- ... (35 lines removed)
- {%- endfor -%}
{%- endif -%}
Result
# Before (inlined, not prefix-preserving):
<|turn>model
<|tool_call>call:multiply{a:3,b:4}<tool_call|><|tool_response>response:multiply{value:<|"|>12<|"|>}<tool_response|><|turn>model
<|channel>thought
<channel|>
# After (standalone, prefix-preserving):
<|turn>model
<|tool_call>call:multiply{a:3,b:4}<tool_call|><turn|>
<|tool_response>response:multiply{value:<|"|>12<|"|>}<tool_response|><|turn>model
<|channel>thought
<channel|>
The model turn now closes with <turn|> before the tool response, so appending tool messages only adds tokens after the existing output.
qgallouedec changed pull request status to closed