minor✓ Resolved

Serverless Inference - High error rates for open source models ( Qwen 3 32B)

DigitalOcean•Tue, Apr 7, 2026, 12:49 PM

Status

resolved

Duration

3h 5m

Updates

Coverage

0 articles

INCIDENT TIMELINE

✓ ResolvedTue, Apr 7, 2026, 03:55 PM

This incident has been resolved.

identifiedTue, Apr 7, 2026, 12:55 PM

We are currently investigating reports of elevated latency affecting requests to this model when using Serverless Inference and Agents.

Earlier observations indicated increased error rates for the open-source Qwen 3 32B model. The Ray dashboard also showed multiple workers in a pending state, suggesting capacity constraints.

Our analysis determined that the model was experiencing higher-than-expected request volume without sufficient resources to scale accordingly. To address this, the node pool size has been increased to improve available capacity. However, there are still insufficient nodes to fully support the desired number of model replicas.

Following the node pool expansion, a new pod-related error has been identified. Our Engineering team is actively working to resolve this issue and restore full service performance.

investigatingTue, Apr 7, 2026, 12:49 PM

Serverless inference for alibaba-qwen3-32b (Qwen 3 32B) in tor1 is experiencing high error rates starting at 10:46 UTC.

📊 TECHNICAL DETAILS

Internal ID

0e306cf3-7a4e-42af-a408-78ce1639a50b

External ID

bx60kdvsvtvb

🕐 Started

Tue, Apr 7, 2026, 12:49 PM

🔄 Last Updated

Tue, Apr 7, 2026, 03:55 PM

✓ Resolved

Tue, Apr 7, 2026, 03:55 PM

MINOR

Impact Level

3h 5m

Total Duration

Status Updates

Affected Components

Gradient AI

REQUEST COVERAGE

No article has been written for this incident yet. When 100 people request coverage, we automatically generate one.

0 / 100 requests100 more needed