nbdeepseek is a Network and generation performance measurement plug-in for the DeepSeek / OpenAI compatible streaming Chat Completion API. It measures the full link indicators from the Network layer (DNS, TCP, SSL, TTFB) to the LLM generation layer (first token delay, inference speed, content speed) by sending the real streaming conversation Request.| Parameter name | Alias | type | Is it required? | Default value | Explanation and Impact |
|---|---|---|---|---|---|
base_url | url | string | yes | — | API base URL, such as https://api.deepseek.com/v1. The plugin will POST request to {base_url}/{endpoint}. If filled in incompletely, it may result in Request 404 or connection failure. |
endpoint | — | string | no | chat/completions | API endpoint path, spliced after base_url. If the target service uses non-standard endpoints (such as custom gateways), this parameter can be modified. |
api_key | key | string | yes | — | Bearer Token, used for Authorization: Bearer <api_key> Request header. If missing or invalid, the API will return 401 Unauthorized Error. |
| Parameter name | Type | Required | Default value | Explanation and impact |
|---|---|---|---|---|
model | string | no | "" | The Data Model name used by Request is filled in the model field of JSON. If it is empty, some gateways may report an error, and some may use the default Data Model. |
prompt | string | no | "" | User message content, fill in the messages[0].content field of JSON. Directly affects the LLM's generated content length and Theme, thereby significantly affecting the number of Tokens and the generation time. It is recommended to fix prompt to obtain comparable baseline data. |
| Parameter name | Type | Required | Default value | Explanation and impact |
|---|---|---|---|---|
timeout | integer | no | 300000 | Overall download timeout in milliseconds (5 minutes). The total time limit from sending Request to fully receiving the streaming response. If the content generated by LLM is long or Network is slow, it needs to be increased appropriately, otherwise it will be forcibly interrupted and a timeout of Error will be reported. |
readtimeut | integer | no | 30000 | Socket single read timeout, unit milliseconds. Note the spelling is readtimeut (not readtimeout). If the interval between two SSE data blocks exceeds this value, the read timeout Error will be triggered. Used to detect long pauses in streaming responses. |
| Parameter name | Type | Required | Default value | Explanation and impact |
|---|---|---|---|---|
check | string | no | "" | Response content validation rules. Used to ensure that LLM returns the expected content and not garbled characters or Error information. • Plain Text: Perform substring inclusion matching (case sensitive). • Regular Expression: Use the /pattern/ package, such as /hello.+/, to perform regular matching. If fails to be verified, the plug-in will report 699002 Error code. |
cspm | integer | no | 100 | Content Speed Percentage Multiplier (Content Speed Percentage Multiplier**). Quality control threshold for detecting Exception responses. calculation logic: If content speed / reasoning speed * 100 - 100 >= cspm, that is, the content generation speed is faster than the inference speed by more than cspm%, then the judgment result is Exception and reported to 612280 (client Error). The larger the value, the greater the speed difference tolerated; if set to 0, any content speed > reasoning speed will report an error. |
| Stage | Indicator name | Unit | Description |
|---|---|---|---|
0 | Total time spent | ms | The overall time taken from initiating HTTP request to fully receiving the streaming response. It is the core metric for measuring the end-to-end performance of APIs. |
1 | DNS Lookup Time | ms | The time taken to resolve the base_url domain name to IP. If using IP direct connection, it is close to 0. |
2 | Connection Establishment Time | ms | The time taken to establish the TCP three-way handshake. Reflects the direct connection quality from layer Network to the target server. |
3 | SSL/TLS handshake time | ms | The HTTPS encrypted handshake takes time. Including certificate exchange, key negotiation, etc. |
4 | Request sending time | ms | The time taken to send the HTTP POST request header and body. Usually very small, but if the upstream bandwidth is limited or the Body is very large (such as an image URL), it may increase significantly. |
5 | Time to First Byte (TTFB) | ms | The time from sending Request to receiving the first response byte. Reflects the response speed of the server in processing the first packet and is a key indicator of API latency. |
6 | Remaining time to receive | ms | The time from the first byte until the entire streaming response is fully received. Mainly affected by LLM generation speed and generation length. |
7 | First Token Delay | ms | The time from Request to the receipt of the first SSE data block with actual content (delta.content or delta.reasoning_content). This is a core measure of LLM response sensitivity. |
8 | Reasoning Token Speed | tokens/s × 1000 | The speed of generation of inference content (reasoning_content). The value needs to be divided by 1000 to get the true tokens/s. Throughput reflecting the thought process of Data Model. |
9 | Content Token Speed | tokens/s × 1000 | The speed of generating formal reply content (content). The value needs to be divided by 1000. Throughput reflecting the text output of Data Model. |
10 | average overall speed | tokens/s × 1000 | The speed of the total number of tokens (inference + content) divided by the total elapsed time. The value needs to be divided by 1000. Used to horizontally compare the comprehensive generation efficiency of different Data Model/services. |
11 | Reasoning Token quantity | tokens | The accumulated number of reasoning_content Tokens. Split estimates by whitespace characters from the content in the SSE stream. |
12 | Content Token quantity | tokens | The accumulated number of content Tokens. Same as above, splitting estimates by whitespace characters. |
13 | Inference generation time | ms | The time window from the start of output of reasoning_content to the end of reasoning_content. |
14 | Content generation time | ms | The time window from the start of outputting content to the end of content. |
15 | Completion speed | tokens/s × 1000 | The completion speed (completion_tokens / total elapsed time × 1000) calculated based on the usage.completion_tokens returned by the API. The difference from stage 10 is the use of official statistics rather than streaming estimates. |
| Stage | Description |
|---|---|
0 | Log/Debug Information. General Log, debugging output, response content when debug=true, etc. |
1 | Server IP. base_url Actual IP address after domain name resolution. |
2 | Target Host name. The domain name part of base_url. |
| info value | meaning | Trigger condition |
|---|---|---|
612280 | Client Error | Common client issues such as settings missing (no URL or key), content speed Exception (cspm verification failed), etc. |
699001 | Reasoning Error | After the streaming response ends, the total number of Tokens is < 1 or the number of content Tokens is < 1. Usually it means that Data Model returns empty, timeout interrupt or stream Exception ends. |
699002 | Content check failed | The text or regular expression specified by the check parameter is not matched in the final generated content. |
| Standard Error | Network/HTTP Error | DNS failure (such as 612007), connection failure, SSL failure, HTTP 4xx/5xx, read timeout, etc. |
delta.reasoning_content and delta.content of each SSE chunk according to the regular \s+ (blank characters). This may differ from the tokenizer count in the official usage, so:usage.completion_tokens, more accurate but may only be received after the stream ends.cspm=100:| reasoning speed | content speed | calculate | result |
|---|---|---|---|
| 1000 (1.0 t/s) | 1500 (1.5 t/s) | 1500/1000*100-100 = 50 | 50 < 100, passed |
| 1000 (1.0 t/s) | 2500 (2.5 t/s) | 2500/1000*100-100 = 150 | 150 >= 100, failed, reported 12280 |
base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please explain in one sentence what artificial intelligence is
timeout=120000
readtimeut=30000base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
What is prompt=1+1 equal to? Please answer numbers only
check=2
cspm=100base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please describe the sun
check=/sun.+star/base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please write a 500-word essay about cloud computing
timeout=300000
readtimeut=60000stream=true. If the target API does not support SSE (Server-Sent Events) streaming return, the content will not be parsed correctly, which may result in 699001 (the number of Tokens is 0).usage counting method. When comparing, please distinguish between stage 11/12 (estimated) and stage 15 (based on official completion_tokens).readtimeut (missing the letters o), please strictly follow this spelling, otherwise the Default value 30000 will be used. This value detects the quiet time between SSE data blocks, not the overall timeout.cspm=100 may be a false positive. It is recommended to observe the normal ratio through debugging mode before setting the threshold.