Instructions to use TechxGenus/Mini-Jamba-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TechxGenus/Mini-Jamba-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TechxGenus/Mini-Jamba-v2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TechxGenus/Mini-Jamba-v2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("TechxGenus/Mini-Jamba-v2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TechxGenus/Mini-Jamba-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TechxGenus/Mini-Jamba-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TechxGenus/Mini-Jamba-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TechxGenus/Mini-Jamba-v2
- SGLang
How to use TechxGenus/Mini-Jamba-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TechxGenus/Mini-Jamba-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TechxGenus/Mini-Jamba-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TechxGenus/Mini-Jamba-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TechxGenus/Mini-Jamba-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TechxGenus/Mini-Jamba-v2 with Docker Model Runner:
docker model run hf.co/TechxGenus/Mini-Jamba-v2
Could you please share the initial weights of one of the experts from jamba?
I'm unable to load the large weights from jamba. It's almost impossible to grab an A100 with 80GB in Google Colab or to grab multiple GPUs. Therefore, I came across your fantastic repo while looking for someone who could split one of jamba's experts and share it as initial weights.
Pretraining is not necessary. Would you be able to share the initial weights of one of jamba's experts? Thank you.
Hi @danielpark , it is not easy to split an expert to convert it into a dense model like mixtral. Jamba's moe implementation is slightly different. Simply splitting an expert will only lead to garbled output.
I'll try to see if there are any other merging strategies.
Thank you sincerely for the swift and impressive work as well as providing an open script. Although not many people are aware of this fantastic work yet, it will undoubtedly be very useful. Thank you!