Instructions to use DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2

SGLang

How to use DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2 with Docker Model Runner:
```
docker model run hf.co/DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2
```

'Make knowledge free for everyone'

The goal

The goal of the model to provide a fine-tuned Phi2 (https://huggingface.co/microsoft/phi-2) model that has knowledge about the Vintage NEXTSTEP Operating System, and able to answer question in the topic.

Details

The model has trained on 35439 Question Answer pairs automatically generated from the NEXTSTEP 3.3 System Administrator documentation. For the training data generation locally running Q8 Quantized Orca2 13B (https://huggingface.co/TheBloke/Orca-2-13B-GGUF) model has been used. The training data generation was completely unsuperwised, with only some sanity check (like ignore data chunks contains less than 100 tokens). The maximum token size for Orca2 is 4096 so a simple rule of split chunks over 3500 tokens (considering propt instructions) has been used. Chunking did not consider context (text data might split within the context). Evaluation set has been generated similar method on 1% of the raw data with LLama2 chat (https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF).

Trained locally on 2x3090 GPU with vanila DDP with HuggingFace Accelerate for 50 Epoch. As I wanted to add new knowledge to the base model r=128 and lora_alpha=128 has been used -> LoRA weights were 3.5% of the base model.

Sample code

Chat with model sample code: https://github.com/csabakecskemeti/ai_utils/blob/main/generate.py

For the best result instruct the model to not refer to other chapers but collect the whole data, like: "Give me a complete answer do not refer to other chapters but collect the information from them. How to setup a local network in Openstep OS?"

I'm doing this to 'Make knowledge free for everyone', using my personal time and resources.

If you want to support my efforts please visit my ko-fi page: https://ko-fi.com/devquasar

Also feel free to visit my website https://devquasar.com/

Downloads last month: 10

Safetensors

Model size

3B params

Tensor type

F16

Model tree for DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2

Base model

microsoft/phi-2

Finetuned

(410)

this model

Collection including DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2

Own Models

Collection

22 items • Updated Feb 25, 2025 • 1