Text Generation
Transformers
PyTorch
English
experimental
research
bit-level
transformer
reversible
safety
telemetry
language-modeling
Instructions to use WCNegentropy/BitTransformerLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WCNegentropy/BitTransformerLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WCNegentropy/BitTransformerLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WCNegentropy/BitTransformerLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WCNegentropy/BitTransformerLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WCNegentropy/BitTransformerLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/WCNegentropy/BitTransformerLM
- SGLang
How to use WCNegentropy/BitTransformerLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WCNegentropy/BitTransformerLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WCNegentropy/BitTransformerLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use WCNegentropy/BitTransformerLM with Docker Model Runner:
docker model run hf.co/WCNegentropy/BitTransformerLM
| import numpy as np | |
| from typing import Dict, List, TYPE_CHECKING | |
| import torch | |
| from sklearn.cluster import KMeans | |
| if TYPE_CHECKING: # pragma: no cover | |
| from .model import BitTransformerLM | |
| class TelemetrySynthesizer: | |
| """Analyze telemetry batches and cluster activation patterns.""" | |
| def __init__(self, n_clusters: int = 2) -> None: | |
| self.n_clusters = n_clusters | |
| def _summary(self, telemetry: Dict[str, List[torch.Tensor]]) -> np.ndarray: | |
| """Compute activation/attention summaries for a single telemetry dict.""" | |
| acts = telemetry["activations"] | |
| attn = telemetry["attention_maps"] | |
| summaries = [] | |
| for a, m in zip(acts, attn): | |
| mean = a.mean().item() | |
| var = a.var(unbiased=False).item() | |
| prob = m.softmax(-1) | |
| entropy = -(prob * prob.clamp_min(1e-9).log()).sum(-1).mean().item() | |
| summaries.append([mean, var, entropy]) | |
| return np.array(summaries).ravel() | |
| def synthesize( | |
| self, telemetries: List[Dict[str, List[torch.Tensor]]], bit_seqs: torch.Tensor | |
| ) -> Dict[str, List]: | |
| """Cluster telemetry summaries and return cluster info.""" | |
| data = np.stack([self._summary(t) for t in telemetries]) | |
| km = KMeans(n_clusters=self.n_clusters, n_init=1) | |
| labels = km.fit_predict(data) | |
| representatives: List[List[int]] = [] | |
| for c in range(self.n_clusters): | |
| idx = np.where(labels == c)[0] | |
| if len(idx) > 0: | |
| representatives.append(bit_seqs[idx[0]].tolist()) | |
| else: | |
| representatives.append([]) | |
| return {"cluster_assignments": labels.tolist(), "representatives": representatives} | |
| def cluster_sequences( | |
| self, model: "BitTransformerLM", bit_seqs: torch.Tensor | |
| ) -> List[List[int]]: | |
| """Run the model to gather telemetry and return representative sequences. | |
| Parameters | |
| ---------- | |
| model: BitTransformerLM | |
| Model used to compute telemetry for each sequence. | |
| bit_seqs: torch.Tensor | |
| Tensor containing one bit sequence per row. | |
| Returns | |
| ------- | |
| list[list[int]] | |
| Representative sequences chosen from KMeans clusters. | |
| """ | |
| telemetries: List[Dict[str, List[torch.Tensor]]] = [] | |
| with torch.no_grad(): | |
| for seq in bit_seqs: | |
| _, tele = model(seq.unsqueeze(0)) | |
| telemetries.append(tele) | |
| info = self.synthesize(telemetries, bit_seqs) | |
| return info["representatives"] | |
| def detect_metric_drift( | |
| metrics_log: Dict[str, List[float]], | |
| window: int = 10, | |
| threshold: float = 0.2, | |
| ) -> Dict[str, bool]: | |
| """Detect metric drift between consecutive windows. | |
| Args: | |
| metrics_log: History of scalar metrics keyed by name. | |
| window: Number of recent steps to compare. | |
| threshold: Absolute difference required to flag drift. | |
| Returns: | |
| Dictionary mapping metric keys to a boolean drift indicator. | |
| """ | |
| drift = {} | |
| for key, values in metrics_log.items(): | |
| if len(values) < window * 2: | |
| drift[key] = False | |
| continue | |
| recent = np.mean(values[-window:]) | |
| prev = np.mean(values[-2 * window : -window]) | |
| drift[key] = abs(recent - prev) > threshold | |
| return drift | |