How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)

About 3 min

How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)

OpenAI has surprised the AI community with a late-night announcement: for the first time since GPT-2, it is open-sourcing large language models again. This time, we get two reasoning models – GPT-OSS-20B and GPT-OSS-120B, capable of delivering performance close to o4-mini while running locally on high-end laptops or even smartphones. The global developer community is buzzing with excitement.

Highlights of the Release

Two open-source reasoning models: GPT-OSS-20B (lightweight version) and GPT-OSS-120B (flagship version).
Performance close to o4-mini, outperforming many other open-source models in coding, mathematics, and medical benchmarks.
Low hardware requirements:
- GPT-OSS-20B: Runs on devices with as little as 16GB memory, ideal for local or on-device inference.
- GPT-OSS-120B: Runs on a single 80GB GPU (e.g., NVIDIA H100).
Apache 2.0 license: Free for commercial use and customization, with no copyright or patent risks.
Fine-tunable and adjustable reasoning levels, with full chain-of-thought output and agentic capabilities like function calling and tool usage.

Official links:

GitHub: https://github.com/openai/gpt-oss
Hugging Face 20B: https://huggingface.co/openai/gpt-oss-20b
Hugging Face 120B: https://huggingface.co/openai/gpt-oss-120b
OpenAI Blog: Introducing GPT-OSS
Playground: https://www.gpt-oss.com/

Quickstart Tutorial: Running GPT-OSS Locally

If you want to try these models right away, you can either test them online via the Playground or download them from Hugging Face for local deployment. Below is a simple setup guide.

1. Set Up Your Environment

Recommended: Linux or macOS (Windows via WSL2).

# Create a Python environment
conda create -n gptoss python=3.10
conda activate gptoss

# Install dependencies
pip install torch transformers accelerate

2. Download the Model

Example for the 20B model:

git lfs install
git clone https://huggingface.co/openai/gpt-oss-20b

For faster downloads:

pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

3. Run a Simple Test

Create a file demo.py:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "./gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)

Run:

python demo.py

4. Adjust Reasoning Strength

You can control reasoning depth with a special tag:

prompt = "<reasoning:high>\nSolve this math problem: 2*(3+5)^2 = ?"

5. Deploy as an API

If you want to expose the model via a local API:

pip install fastapi uvicorn

# app.py
from fastapi import FastAPI
from transformers import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()
tokenizer = AutoTokenizer.from_pretrained("./gpt-oss-20b")
model = AutoModelForCausalLM.from_pretrained("./gpt-oss-20b", device_map="auto", torch_dtype="auto")

@app.post("/chat")
async def chat(prompt: str):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=200)
    return {"response": tokenizer.decode(outputs[0], skip_special_tokens=True)}

Run:

uvicorn app:app --host 0.0.0.0 --port 8000

Send a POST request to http://localhost:8000/chat to test your API.

Summary

OpenAI is back in the open-source game with two powerful reasoning models that can run locally on consumer hardware.
Apache 2.0 licensing makes them ideal for research, startups, and commercial use cases.
Hugging Face downloads are already surging, so try the Playground first before setting up locally.

With GPT-5 still under wraps, GPT-OSS is already shaping up to be one of the most exciting open-source AI developments of the year. Expect a wave of new projects and applications built on these models in the coming days.

How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)

How to Run OpenAI GPT-OSS 20B and 120B Locally on Laptop or Mobile (Step-by-Step Guide)

Highlights of the Release

Quickstart Tutorial: Running GPT-OSS Locally

1. Set Up Your Environment

2. Download the Model

3. Run a Simple Test

4. Adjust Reasoning Strength

5. Deploy as an API

Summary

FAQ

1. What is GPT-OSS?

2. Can GPT-OSS run on a laptop or smartphone?

3. Is GPT-OSS free to use commercially?

4. How can I try GPT-OSS online without downloading?

5. What makes GPT-OSS different from other open-source LLMs?