Update chat_template.jinja

#2
by qgallouedec HF Staff - opened

Making the Gemma4 chat template prefix-preserving

Problem

The new template inlines tool responses into the model turn via a forward-scan. This means appending a role:tool message changes tokens before the turn-end marker, breaking prefix-preservation:

# Without tool response → turn closes immediately:
...model\n<|tool_call>call:func{}<tool_call|><turn|>\n

# With tool response → new content inserted before <turn|>:
...model\n<|tool_call>call:func{}<tool_call|><|tool_response>...<tool_response|>...

Fix (3 changes)

1. Handle role:tool messages in the main loop — render them as standalone <|tool_response> blocks after the model turn closes, instead of inlining them:

 {#- Loop through messages -#}
 {%- for message in loop_messages -%}
-    {%- if message['role'] != 'tool' -%}
+    {%- if message['role'] == 'tool' -%}
+        {#- Render tool responses as standalone blocks (outside model turn) for prefix-preservation -#}
+        {%- set tool_name = message.get('name') | default('unknown') -%}
+        {%- set tool_body = message.get('content') -%}
+        {%- if tool_body is string -%}
+            {{- format_tool_response_block(tool_name, tool_body) -}}
+        {%- elif tool_body is sequence and tool_body is not string -%}
+            {%- set ns_txt = namespace(s='') -%}
+            {%- for part in tool_body -%}
+                {%- if part.get('type') == 'text' -%}
+                    {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
+                {%- endif -%}
+            {%- endfor -%}
+            {{- format_tool_response_block(tool_name, ns_txt.s) -}}
+        {%- else -%}
+            {{- format_tool_response_block(tool_name, tool_body) -}}
+        {%- endif -%}
+    {%- else -%}

2. Include tool messages in the previous-message scan — so an assistant message after a tool opens a new <|turn>model instead of continuing the previous model turn:

     {%- if loop.index0 > 0 -%}
         {%- for j in range(loop.index0 - 1, -1, -1) -%}
             {%- if not prev_nt.found -%}
-                {%- if loop_messages[j]['role'] != 'tool' -%}
-                    {%- set prev_nt.role = loop_messages[j]['role'] -%}
-                    {%- set prev_nt.found = true -%}
-                {%- endif -%}
+                {%- set prev_nt.role = loop_messages[j]['role'] -%}
+                {%- set prev_nt.found = true -%}
             {%- endif -%}
         {%- endfor -%}
     {%- endif -%}

3. Remove the forward-scan that inlined tool responses into the model turn:

             {%- if message.get('tool_responses') -%}
                 {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
                 ...
-            {%- elif message.get('tool_calls') -%}
-                {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
-                {%- set ns_tool_scan = namespace(stopped=false) -%}
-                {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
-                    ...  (35 lines removed)
-                {%- endfor -%}
             {%- endif -%}

Result

# Before (inlined, not prefix-preserving):
<|turn>model
<|tool_call>call:multiply{a:3,b:4}<tool_call|><|tool_response>response:multiply{value:<|"|>12<|"|>}<tool_response|><|turn>model
<|channel>thought
<channel|>

# After (standalone, prefix-preserving):
<|turn>model
<|tool_call>call:multiply{a:3,b:4}<tool_call|><turn|>
<|tool_response>response:multiply{value:<|"|>12<|"|>}<tool_response|><|turn>model
<|channel>thought
<channel|>

The model turn now closes with <turn|> before the tool response, so appending tool messages only adds tokens after the existing output.

qgallouedec changed pull request status to closed

Sign up or log in to comment