How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)
How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)
OpenAI has surprised the AI community with a late-night announcement: for the first time since GPT-2, it is open-sourcing large language models again. This time, we get two reasoning models โ GPT-OSS-20B and GPT-OSS-120B, capable of delivering performance close to o4-mini while running locally on high-end laptops or even smartphones. The global developer community is buzzing with excitement.
Highlights of the Release
- Two open-source reasoning models: GPT-OSS-20B (lightweight version) and GPT-OSS-120B (flagship version).
- Performance close to o4-mini, outperforming many other open-source models in coding, mathematics, and medical benchmarks.
- Low hardware requirements:
- GPT-OSS-20B: Runs on devices with as little as 16GB memory, ideal for local or on-device inference.
- GPT-OSS-120B: Runs on a single 80GB GPU (e.g., NVIDIA H100).
- Apache 2.0 license: Free for commercial use and customization, with no copyright or patent risks.
- Fine-tunable and adjustable reasoning levels, with full chain-of-thought output and agentic capabilities like function calling and tool usage.
Official links:
- GitHub: https://github.com/openai/gpt-oss
- Hugging Face 20B: https://huggingface.co/openai/gpt-oss-20b
- Hugging Face 120B: https://huggingface.co/openai/gpt-oss-120b
- OpenAI Blog: Introducing GPT-OSS
- Playground: https://www.gpt-oss.com/
Quickstart Tutorial: Running GPT-OSS Locally
If you want to try these models right away, you can either test them online via the Playground or download them from Hugging Face for local deployment. Below is a simple setup guide.
1. Set Up Your Environment
Recommended: Linux or macOS (Windows via WSL2).
# Create a Python environment
conda create -n gptoss python=3.10
conda activate gptoss
# Install dependencies
pip install torch transformers accelerate
2. Download the Model
Example for the 20B model:
git lfs install
git clone https://huggingface.co/openai/gpt-oss-20b
For faster downloads:
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
3. Run a Simple Test
Create a file demo.py:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "./gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)
Run:
python demo.py
4. Adjust Reasoning Strength
You can control reasoning depth with a special tag:
prompt = "<reasoning:high>\nSolve this math problem: 2*(3+5)^2 = ?"
5. Deploy as an API
If you want to expose the model via a local API:
pip install fastapi uvicorn
# app.py
from fastapi import FastAPI
from transformers import AutoTokenizer, AutoModelForCausalLM
app = FastAPI()
tokenizer = AutoTokenizer.from_pretrained("./gpt-oss-20b")
model = AutoModelForCausalLM.from_pretrained("./gpt-oss-20b", device_map="auto", torch_dtype="auto")
@app.post("/chat")
async def chat(prompt: str):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
return {"response": tokenizer.decode(outputs[0], skip_special_tokens=True)}
Run:
uvicorn app:app --host 0.0.0.0 --port 8000
Send a POST request to http://localhost:8000/chat to test your API.
Summary
OpenAI is back in the open-source game with two powerful reasoning models that can run locally on consumer hardware.
Apache 2.0 licensing makes them ideal for research, startups, and commercial use cases.
Hugging Face downloads are already surging, so try the Playground first before setting up locally.
With GPT-5 still under wraps, GPT-OSS is already shaping up to be one of the most exciting open-source AI developments of the year. Expect a wave of new projects and applications built on these models in the coming days.
FAQ
1. What is GPT-OSS?
GPT-OSS is OpenAI's newly open-sourced reasoning model series, including GPT-OSS-20B and GPT-OSS-120B, designed to deliver high-level reasoning performance similar to o4-mini while running locally on consumer hardware.
2. Can GPT-OSS run on a laptop or smartphone?
Yes. GPT-OSS-20B can run on devices with as little as 16GB of RAM, making it possible to run on high-end laptops or even smartphones. GPT-OSS-120B requires a single 80GB GPU for optimal performance.
3. Is GPT-OSS free to use commercially?
Yes. The models are released under the Apache 2.0 license, allowing free usage, modification, and commercial deployment without copyright or patent concerns.
4. How can I try GPT-OSS online without downloading?
OpenAI provides a Playground where you can test the models directly in your browser before deciding to download and run them locally.
5. What makes GPT-OSS different from other open-source LLMs?
GPT-OSS offers adjustable reasoning strength, full chain-of-thought transparency, agentic function-calling capabilities, and better performance in coding, mathematics, and medical benchmarks compared to similar-sized models.