DeepSeek Plugin Configuration Guide

1. Overview

nbdeepseek is a Network and generation performance measurement plug-in for the DeepSeek / OpenAI compatible streaming Chat Completion API. It measures the full link indicators from the Network layer (DNS, TCP, SSL, TTFB) to the LLM generation layer (first token delay, inference speed, content speed) by sending the real streaming conversation Request.

2. Detailed explanation of input parameters

2.1 API connection parameters

Parameter name	Alias	type	Is it required?	Default value	Explanation and Impact
`base_url`	`url`	string	yes	—	API base URL, such as `https://api.deepseek.com/v1`. The plugin will POST request to `{base_url}/{endpoint}`. If filled in incompletely, it may result in Request 404 or connection failure.
`endpoint`	—	string	no	`chat/completions`	API endpoint path, spliced after `base_url`. If the target service uses non-standard endpoints (such as custom gateways), this parameter can be modified.
`api_key`	`key`	string	yes	—	Bearer Token, used for `Authorization: Bearer <api_key>` Request header. If missing or invalid, the API will return 401 Unauthorized Error.

2.2 Request content parameters

Parameter name	Type	Required	Default value	Explanation and impact
`model`	string	no	`""`	The Data Model name used by Request is filled in the `model` field of JSON. If it is empty, some gateways may report an error, and some may use the default Data Model.
`prompt`	string	no	`""`	User message content, fill in the `messages[0].content` field of JSON. Directly affects the LLM's generated content length and Theme, thereby significantly affecting the number of Tokens and the generation time. It is recommended to fix prompt to obtain comparable baseline data.

2.3 Timeout and Network parameters

Parameter name	Type	Required	Default value	Explanation and impact
`timeout`	integer	no	`300000`	Overall download timeout in milliseconds (5 minutes). The total time limit from sending Request to fully receiving the streaming response. If the content generated by LLM is long or Network is slow, it needs to be increased appropriately, otherwise it will be forcibly interrupted and a timeout of Error will be reported.
`readtimeut`	integer	no	`30000`	Socket single read timeout, unit milliseconds. Note the spelling is `readtimeut` (not `readtimeout`). If the interval between two SSE data blocks exceeds this value, the read timeout Error will be triggered. Used to detect long pauses in streaming responses.

2.4 Validation and Quality Control Parameters

Parameter name	Type	Required	Default value	Explanation and impact
`check`	string	no	`""`	Response content validation rules. Used to ensure that LLM returns the expected content and not garbled characters or Error information. • Plain Text: Perform substring inclusion matching (case sensitive). • Regular Expression: Use the `/pattern/` package, such as `/hello.+/`, to perform regular matching. If fails to be verified, the plug-in will report `699002` Error code.
`cspm`	integer	no	`100`	Content Speed Percentage Multiplier (Content Speed Percentage Multiplier*). Quality control threshold for detecting Exception responses. calculation logic: If `content speed / reasoning speed 100 - 100 >= cspm`, that is, the content generation speed is faster than the inference speed by more than `cspm`%, then the judgment result is Exception and reported to `612280` (client Error). The larger the value, the greater the speed difference tolerated; if set to `0`, any content speed > reasoning speed will report an error.

3. Detailed explanation of output indicators

3.1 Numerical performance indicators

Stage	Indicator name	Unit	Description
`0`	Total time spent	ms	The overall time taken from initiating HTTP request to fully receiving the streaming response. It is the core metric for measuring the end-to-end performance of APIs.
`1`	DNS Lookup Time	ms	The time taken to resolve the `base_url` domain name to IP. If using IP direct connection, it is close to 0.
`2`	Connection Establishment Time	ms	The time taken to establish the TCP three-way handshake. Reflects the direct connection quality from layer Network to the target server.
`3`	SSL/TLS handshake time	ms	The HTTPS encrypted handshake takes time. Including certificate exchange, key negotiation, etc.
`4`	Request sending time	ms	The time taken to send the HTTP POST request header and body. Usually very small, but if the upstream bandwidth is limited or the Body is very large (such as an image URL), it may increase significantly.
`5`	Time to First Byte (TTFB)	ms	The time from sending Request to receiving the first response byte. Reflects the response speed of the server in processing the first packet and is a key indicator of API latency.
`6`	Remaining time to receive	ms	The time from the first byte until the entire streaming response is fully received. Mainly affected by LLM generation speed and generation length.
`7`	First Token Delay	ms	The time from Request to the receipt of the first SSE data block with actual content (`delta.content` or `delta.reasoning_content`). This is a core measure of LLM response sensitivity.
`8`	Reasoning Token Speed	tokens/s × 1000	The speed of generation of inference content (`reasoning_content`). The value needs to be divided by 1000 to get the true tokens/s. Throughput reflecting the thought process of Data Model.
`9`	Content Token Speed	tokens/s × 1000	The speed of generating formal reply content (`content`). The value needs to be divided by 1000. Throughput reflecting the text output of Data Model.
`10`	average overall speed	tokens/s × 1000	The speed of the total number of tokens (inference + content) divided by the total elapsed time. The value needs to be divided by 1000. Used to horizontally compare the comprehensive generation efficiency of different Data Model/services.
`11`	Reasoning Token quantity	tokens	The accumulated number of `reasoning_content` Tokens. Split estimates by whitespace characters from the content in the SSE stream.
`12`	Content Token quantity	tokens	The accumulated number of `content` Tokens. Same as above, splitting estimates by whitespace characters.
`13`	Inference generation time	ms	The time window from the start of output of reasoning_content to the end of reasoning_content.
`14`	Content generation time	ms	The time window from the start of outputting content to the end of content.
`15`	Completion speed	tokens/s × 1000	The completion speed (`completion_tokens / total elapsed time × 1000`) calculated based on the `usage.completion_tokens` returned by the API. The difference from stage 10 is the use of official statistics rather than streaming estimates.

3.2 Text and Profiling information

Stage	Description
`0`	Log/Debug Information. General Log, debugging output, response content when `debug=true`, etc.
`1`	Server IP. `base_url` Actual IP address after domain name resolution.
`2`	Target Host name. The domain name part of `base_url`.

3.3 Error code

info value	meaning	Trigger condition
`612280`	Client Error	Common client issues such as settings missing (no URL or key), content speed Exception (`cspm` verification failed), etc.
`699001`	Reasoning Error	After the streaming response ends, the total number of Tokens is `< 1` or the number of content Tokens is `< 1`. Usually it means that Data Model returns empty, timeout interrupt or stream Exception ends.
`699002`	Content check failed	The text or regular expression specified by the `check` parameter is not matched in the final generated `content`.
Standard Error	Network/HTTP Error	DNS failure (such as `612007`), connection failure, SSL failure, HTTP 4xx/5xx, read timeout, etc.

4. Speed calculation and verification logic

4.1 Token counting method

During the streaming reception process, the plug-in splits and counts delta.reasoning_content and delta.content of each SSE chunk according to the regular \s+ (blank characters). This may differ from the tokenizer count in the official usage, so:

Stage 11/12 (number of reasoning/content tokens) is streaming estimate.

stage 15 (Completion speed) is based on the official usage.completion_tokens, more accurate but may only be received after the stream ends.

4.2 cspm verification example

Assume cspm=100:

reasoning speed	content speed	calculate	result
1000 (1.0 t/s)	1500 (1.5 t/s)	`1500/1000*100-100 = 50`	`50 < 100`, passed
1000 (1.0 t/s)	2500 (2.5 t/s)	`2500/1000*100-100 = 150`	`150 >= 100`, failed, reported `12280`

5. Typical Configuration Example

5.1 Basic DeepSeek API test

base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please explain in one sentence what artificial intelligence is
timeout=120000
readtimeut=30000

5.2 Testing with content verification

base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
What is prompt=1+1 equal to? Please answer numbers only
check=2
cspm=100

5.3 Regular content verification

base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please describe the sun
check=/sun.+star/

5.4 Long text generation test

base_url=https://api.deepseek.com/v1
api_key=sk-xxxxxxxxxxxxxxxx
model=deepseek-chat
prompt=Please write a 500-word essay about cloud computing
timeout=300000
readtimeut=60000

9. Precautions

Streaming response dependency: The plug-in forces the use of stream=true. If the target API does not support SSE (Server-Sent Events) streaming return, the content will not be parsed correctly, which may result in 699001 (the number of Tokens is 0).

Token counting difference: The streaming estimated Token number is different from the official usage counting method. When comparing, please distinguish between stage 11/12 (estimated) and stage 15 (based on official completion_tokens).

readtimeut spelling: The settings key is readtimeut (missing the letters o), please strictly follow this spelling, otherwise the Default value 30000 will be used. This value detects the quiet time between SSE data blocks, not the overall timeout.

Reasonable value of cspm: If the content of Data Model itself is generated much faster than inference (such as some lightweight Data Model), cspm=100 may be a false positive. It is recommended to observe the normal ratio through debugging mode before setting the threshold.

Callback Threads safety: All callbacks are triggered in the internal Network/parsing Threads, and the host program needs to ensure the Threads safety of the callback function.

DeepSeek Plugin Configuration Guide

1. Overview#

2. Detailed explanation of input parameters#

2.1 API connection parameters#

2.2 Request content parameters#

2.3 Timeout and Network parameters#

2.4 Validation and Quality Control Parameters#

3. Detailed explanation of output indicators#

3.1 Numerical performance indicators#

3.2 Text and Profiling information#

3.3 Error code#

4. Speed ​​calculation and verification logic#

4.1 Token counting method#

4.2 cspm verification example#

5. Typical Configuration Example#

5.1 Basic DeepSeek API test#

5.2 Testing with content verification#

5.3 Regular content verification#

5.4 Long text generation test#

9. Precautions#