Deploy any HuggingFace model with vLLM on Modal in minutes. A production-ready template for running custom LLM inference endpoints on serverless GPUs. Just change the MODEL_NAME environment variable ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results