You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access is provided for non-commercial research and evaluation use. Commercial use requires written approval from Simple Machine Mind.

Log in or Sign Up to review the conditions and access this model content.

EvaluatorDPT โ€” Auditable Decision Models with Learned Abstention and Real-Time Steering

Model ID: pcsankar73s/EvaluatorModel
Author: Sankaranarayanan Palamadai Chandrasekaran, Simple Machine Mind
License: CC BY-NC 4.0 for non-commercial research and evaluation use
Access: Gated model access

Summary

EvaluatorDPT is a bounded operational decision-control model for AI workflows that need explicit, auditable handling of uncertainty. It emits a decision distribution over YES, NO, and TBD. TBD is a learned deferral outcome, not only a post-hoc rejection threshold.

The model is intended as a control layer between probabilistic model inference and downstream execution policy. A deployment domain supplies evidence, risk tolerance, and threshold policy; the model supplies a bounded distribution that can be governed at inference time through recorded thresholds, fallback rules, and review paths. This makes the model useful for auditable decision routing rather than free-form answer generation.

Model Scope

Field Value
Base model bert-base-uncased
Max sequence length 128
Primary output YES / NO / TBD
Evaluated splits Fixed held-out validation and test splits
Training data redistribution Not included with this model

Evaluation Results

Split N Accuracy Macro F1
Validation 44,404 0.8224 0.8213
Test 44,597 0.8260 0.8252

Test Per-Class Performance

Class Precision Recall F1 Support
YES 0.8205 0.8425 0.8314 14,883
NO 0.8598 0.8376 0.8486 15,650
TBD 0.7955 0.7958 0.7956 14,064

Additional Evidence

Evidence Result Interpretation
Validation ECE 0.0338 Confidence is suitable for threshold review under the evaluated validation split
Multi-seed validation stability std=0.0 Deterministic evaluation behavior under recorded seeds 42, 0, and 7
Forced binary YES/NO view Macro F1=0.4945 Removing deferral collapses the TBD class
Targeted boundary refinement HC error @0.85: 0.0558 โ†’ 0.0485 Improves high-confidence error behavior more than raw F1

Certification and evaluation evidence are stored in the publication repository under certification/runs/S12B_20260526/.

Intended Use

EvaluatorDPT is intended for research and evaluation of auditable decision-routing systems, including:

  • Policy-governed approval or rejection routing
  • Compliance and risk triage
  • Moderation escalation
  • Enterprise workflow gating
  • Human-in-the-loop review queues

The model should be used with a deployment-specific threshold policy. A deployment can route low-confidence or low-margin predictions to TBD without retraining the model.

Outputs

Output Status Description
Decision distribution Validated Probability distribution over YES, NO, and TBD
Decision confidence Validated Confidence used for threshold and fallback review
Value auxiliary channel Architectural / staged validation Decision-relative signal requiring separate validation before use as a deployment claim
Emotion/sentiment auxiliary channel Disabled for the evaluated model version No emotion-head performance claim is made

Auxiliary channels are retained as part of the architecture because future validated versions can use them as policy-control signals. They should not be treated as validated outputs unless separately evaluated.

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "pcsankar73s/EvaluatorModel"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

inputs = tokenizer(
    "The available evidence is incomplete, but the request may qualify for approval.",
    return_tensors="pt",
    max_length=128,
    truncation=True,
)

with torch.no_grad():
    logits = model(**inputs).logits
    probabilities = torch.softmax(logits, dim=-1)

# Label order must be read from the model config before deployment.
# Apply a recorded threshold policy outside the model to produce the routed label.

Limitations

  • Results are specific to the evaluated corpus and setup.
  • The interface is domain-agnostic, but performance is not a domain-transfer claim.
  • New domains require calibration review, threshold selection, policy-version review, and error audit.
  • Inputs longer than 128 tokens require chunking or preprocessing.
  • TBD is a governed deferral output, not a substitute for human or policy review.
  • Auxiliary value and emotion/sentiment claims require separate validation before deployment use.

Data and Licensing

The training data is not redistributed with this model. The evaluated corpus is built primarily from public natural-language inference and stance-style source families together with targeted hard-case construction. Public source datasets remain subject to their original licenses and terms. Users must verify source-data licensing before reuse.

Links

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support