Instructions to use TechxGenus/Mini-Jamba-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TechxGenus/Mini-Jamba-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TechxGenus/Mini-Jamba-v2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TechxGenus/Mini-Jamba-v2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("TechxGenus/Mini-Jamba-v2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TechxGenus/Mini-Jamba-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TechxGenus/Mini-Jamba-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/Mini-Jamba-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TechxGenus/Mini-Jamba-v2

SGLang

How to use TechxGenus/Mini-Jamba-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TechxGenus/Mini-Jamba-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/Mini-Jamba-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TechxGenus/Mini-Jamba-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/Mini-Jamba-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TechxGenus/Mini-Jamba-v2 with Docker Model Runner:
```
docker model run hf.co/TechxGenus/Mini-Jamba-v2
```

Could you please share the initial weights of one of the experts from jamba?

by danielpark - opened Apr 7, 2024

Discussion

danielpark

Apr 7, 2024

I'm unable to load the large weights from jamba. It's almost impossible to grab an A100 with 80GB in Google Colab or to grab multiple GPUs. Therefore, I came across your fantastic repo while looking for someone who could split one of jamba's experts and share it as initial weights.

Pretraining is not necessary. Would you be able to share the initial weights of one of jamba's experts? Thank you.

TechxGenus

Owner Apr 8, 2024

Hi @danielpark , it is not easy to split an expert to convert it into a dense model like mixtral. Jamba's moe implementation is slightly different. Simply splitting an expert will only lead to garbled output.
I'll try to see if there are any other merging strategies.

TechxGenus

Owner Apr 8, 2024

https://huggingface.co/TechxGenus/Jamba-v0.1-9B

danielpark

Apr 15, 2024

Thank you sincerely for the swift and impressive work as well as providing an open script. Although not many people are aware of this fantastic work yet, it will undoubtedly be very useful. Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment