Skip to content

Commit 6e16438

Browse files
authored
[Feature] implement log channel separation and request log level system (#7190)
* feat: implement log channel separation and request log level system * fix: log system improvements based on review * add request_id to error logs, use RequestLogLevel enum, and unify logger implementation from utils to logger module
1 parent 29495b2 commit 6e16438

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1955
-638
lines changed

docs/usage/environment_variables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ environment_variables: dict[str, Callable[[], Any]] = {
2525
"FD_LOG_REQUESTS": lambda: int(os.getenv("FD_LOG_REQUESTS", "1")),
2626

2727
# Request logging detail level (0-3). Higher level means more verbose output.
28-
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")),
28+
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "2")),
2929

3030
# Max field length for request logging truncation.
3131
"FD_LOG_MAX_LEN": lambda: int(os.getenv("FD_LOG_MAX_LEN", "2048")),

docs/usage/log.md

Lines changed: 40 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,55 @@
55
FastDeploy generates the following log files during deployment. Below is an explanation of each log's purpose.
66
By default, logs are stored in the `log` directory under the execution path. To specify a custom directory, set the environment variable `FD_LOG_DIR`.
77

8+
## Log Channel Separation
9+
10+
FastDeploy separates logs into three channels:
11+
12+
| Channel | Logger Name | Output Files | Description |
13+
|---------|-------------|--------------|-------------|
14+
| main | `fastdeploy.main.*` | `fastdeploy.log`, `console.log` | Main logs for system configuration, startup info, etc. |
15+
| request | `fastdeploy.request.*` | `request.log` | Request logs for request lifecycle and processing details |
16+
| console | `fastdeploy.console.*` | `console.log` | Console logs, output to terminal and console.log |
17+
18+
## Request Log Levels
19+
20+
Request logs (`request.log`) support 4 levels, controlled by the environment variable `FD_LOG_REQUESTS_LEVEL`:
21+
22+
| Level | Enum Name | Description | Example Content |
23+
|-------|-----------|-------------|-----------------|
24+
| 0 | LIFECYCLE | Lifecycle start/end | Request creation/initialization, completion stats (InputToken/OutputToken/latency), first and last streaming response, request abort |
25+
| 1 | STAGES | Processing stages | Semaphore acquire/release, first token time recording, signal handling (preemption/abortion/recovery), cache task, preprocess time, parameter adjustment warnings |
26+
| 2 | CONTENT | Content and scheduling | Request parameters, processed request, scheduling info (enqueue/pull/finish), response content (long content is truncated) |
27+
| 3 | FULL | Complete raw data | Complete request and response data, raw received request |
28+
29+
Default level is 2 (CONTENT), which logs request parameters, scheduling info, and response content. Lower levels (0-1) only log critical events, while level 3 includes complete raw data.
30+
31+
## Log-Related Environment Variables
32+
33+
| Variable | Default | Description |
34+
|----------|---------|-------------|
35+
| `FD_LOG_DIR` | `log` | Log file storage directory |
36+
| `FD_LOG_LEVEL` | `INFO` | Log level, supports `INFO` or `DEBUG` |
37+
| `FD_LOG_REQUESTS` | `1` | Enable request logging, `0` to disable, `1` to enable |
38+
| `FD_LOG_REQUESTS_LEVEL` | `2` | Request log level, range 0-3 |
39+
| `FD_LOG_MAX_LEN` | `2048` | Maximum length for L2 level log content (excess is truncated) |
40+
| `FD_LOG_BACKUP_COUNT` | `7` | Number of log files to retain |
41+
| `FD_DEBUG` | `0` | Debug mode, `1` enables DEBUG log level |
42+
843
## Inference Service Logs
44+
45+
* `fastdeploy.log` : Main log file, records system configuration, startup information, runtime status, etc.
46+
* `request.log` : Request log file, records user request lifecycle and processing details
47+
* `console.log` : Console log, records model startup time and other information. This log is also printed to the console.
48+
* `error.log` : Error log file, records all ERROR and above level logs
949
* `backup_env.*.json` : Records environment variables set during instance startup. The number of files matches the number of GPU cards.
10-
* `envlog.*` : Logs environment variables set during instance startup. The number of files matches the number of GPU cards.
11-
* `console.log` : Records model startup time and other information. This log is also printed to the console.
12-
* `data_processor.log` : Logs input/output data encoding and decoding details.
13-
* `fastdeploy.log` : Records configuration information during instance startup, as well as request and response details during runtime.
1450
* `workerlog.*` : Tracks model loading progress and inference operator errors. Each GPU card has a corresponding file.
1551
* `worker_process.log` : Logs engine inference data for each iteration.
1652
* `cache_manager.log` : Records KV Cache logical index allocation for each request and cache hit status.
1753
* `launch_worker.log` : Logs model startup information and error messages.
1854
* `gpu_worker.log` : Records KV Cache block count information during profiling.
1955
* `gpu_model_runner.log` : Contains model details and loading time.
2056

21-
## Online Inference Client Logs
22-
* `api_server.log` : Logs startup parameters and received request information.
23-
2457
## Scheduler Logs
2558
* `scheduler.log` : Records scheduler information, including node status and request allocation details.
2659

docs/zh/usage/environment_variables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ environment_variables: dict[str, Callable[[], Any]] = {
2525
"FD_LOG_REQUESTS": lambda: int(os.getenv("FD_LOG_REQUESTS", "1")),
2626

2727
# 请求日志详细级别 (0-3)。级别越高输出越详细。
28-
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")),
28+
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "2")),
2929

3030
# 请求日志字段截断最大长度。
3131
"FD_LOG_MAX_LEN": lambda: int(os.getenv("FD_LOG_MAX_LEN", "2048")),

docs/zh/usage/log.md

Lines changed: 42 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,58 @@
22

33
# 日志说明
44

5-
FastDeploy 在部署过程中,会产生如下日志文件,各日志含义说明
5+
FastDeploy 在部署过程中,会产生如下日志文件,各日志含义说明
66
默认日志目录为执行目录下的 `log` 文件夹,若需要指定可设置环境变量 `FD_LOG_DIR`
77

8+
## 日志通道划分
9+
10+
FastDeploy 将日志分为三个通道:
11+
12+
| 通道 | Logger 名称 | 输出文件 | 说明 |
13+
|------|-------------|----------|------|
14+
| main | `fastdeploy.main.*` | `fastdeploy.log`, `console.log` | 主日志,记录系统配置、启动信息等 |
15+
| request | `fastdeploy.request.*` | `request.log` | 请求日志,记录请求生命周期和处理细节 |
16+
| console | `fastdeploy.console.*` | `console.log` | 控制台日志,输出到终端和 console.log |
17+
18+
## 请求日志级别
19+
20+
请求日志 (`request.log`) 支持 4 个级别,通过环境变量 `FD_LOG_REQUESTS_LEVEL` 控制:
21+
22+
| 级别 | 枚举名 | 说明 | 示例内容 |
23+
|------|--------|------|----------|
24+
| 0 | LIFECYCLE | 生命周期起止 | 请求创建/初始化、完成统计(InputToken/OutputToken/耗时)、流式响应首次和最后发送、请求中止 |
25+
| 1 | STAGES | 处理阶段 | 信号量获取/释放、首 token 时间记录、信号处理(preemption/abortion/recovery)、缓存任务、预处理耗时、参数调整警告 |
26+
| 2 | CONTENT | 内容和调度 | 请求参数、处理后的请求、调度信息(入队/拉取/完成)、响应内容(超长内容会被截断) |
27+
| 3 | FULL | 完整数据 | 完整的请求和响应数据、原始接收请求 |
28+
29+
默认级别为 2 (CONTENT),记录请求参数、调度信息和响应内容。较低级别 (0-1) 只记录关键事件,级别 3 则包含完整原始数据。
30+
31+
## 日志相关环境变量
32+
33+
| 环境变量 | 默认值 | 说明 |
34+
|----------|--------|------|
35+
| `FD_LOG_DIR` | `log` | 日志文件存储目录 |
36+
| `FD_LOG_LEVEL` | `INFO` | 日志级别,支持 `INFO``DEBUG` |
37+
| `FD_LOG_REQUESTS` | `1` | 是否启用请求日志,`0` 禁用,`1` 启用 |
38+
| `FD_LOG_REQUESTS_LEVEL` | `2` | 请求日志级别,范围 0-3 |
39+
| `FD_LOG_MAX_LEN` | `2048` | L2 级别日志内容的最大长度(超出部分会被截断) |
40+
| `FD_LOG_BACKUP_COUNT` | `7` | 日志文件保留数量 |
41+
| `FD_DEBUG` | `0` | 调试模式,`1` 启用时日志级别设为 `DEBUG` |
42+
843
## 推理服务日志
44+
45+
* `fastdeploy.log` : 主日志文件,记录系统配置、启动信息、运行状态等
46+
* `request.log` : 请求日志文件,记录用户请求的生命周期和处理细节
47+
* `console.log` : 控制台日志,记录模型启动耗时等信息,该日志信息会被打印到控制台
48+
* `error.log` : 错误日志文件,记录所有 ERROR 及以上级别的日志
949
* `backup_env.*.json` : 记录当前实例启动时设置的环境变量,文件个数与卡数相同
10-
* `envlog.*` : 记录当前实例启动时设置的环境变量,文件个数与卡数相同
11-
* `console.log` : 记录模型启动耗时等信息,该日志信息会被打印到控制台
12-
* `data_processor.log` : 记录输入数据及输出输出编码解码的内容
13-
* `fastdeploy.log` : 记录当前实例启动的各个 config 的信息,运行中记录用户请求的 request 及 response 信息
1450
* `workerlog.*` : 记录模型启动加载进度及推理算子报错信息,每个卡对应一个文件
1551
* `worker_process.log` : 记录引擎每一轮推理的数据
1652
* `cache_manager.log` : 记录每一个请求分配 KV Cache 的逻辑索引,以及当前请求的命中情况
1753
* `launch_worker.log` : 记录模型启动信息及报错信息
1854
* `gpu_worker.log` : 记录 profile 时计算 KV Cache block 数目的信息
1955
* `gpu_model_runner.log` : 当前的模型信息及加载时间
2056

21-
## 在线推理客户端日志
22-
* `api_server.log` : 记录启动参数,及接收到的请求信息
23-
2457
## 调度器日志
2558
* `scheduler.log` : 记录调度器的信息包含当前结点的信息,每条请求分配的信息
2659

@@ -31,12 +64,11 @@ FastDeploy 在部署过程中,会产生如下日志文件,各日志含义说
3164

3265
* `cache_queue_manager.log` : 记录启动参数,及接收到的请求信息
3366
* `cache_transfer_manager.log` : 记录启动参数,及接收到的请求信息
34-
* `cache_queue_manager.log` : 记录启动参数,及接收到的请求信息
3567
* `launch_cache_manager.log` : 启动 cache transfer 记录启动参数,报错信息
3668

3769
## PD 分离相关日志
3870

39-
* `cache_messager.log` : 记录P 实例使用的传输协议及传输信息
71+
* `cache_messager.log` : 记录 P 实例使用的传输协议及传输信息
4072
* `splitwise_connector.log` : 记录收到 P/D 发送的数据,及建联信息
4173

4274
## CudaGraph 相关日志

fastdeploy/engine/async_llm.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
from fastdeploy.input.preprocess import InputPreprocessor
3838
from fastdeploy.inter_communicator import IPCSignal
3939
from fastdeploy.inter_communicator.zmq_client import ZmqIpcClient
40+
from fastdeploy.logger.request_logger import log_request_error
4041
from fastdeploy.metrics.metrics import main_process_metrics
4142
from fastdeploy.utils import EngineError, envs, llm_logger
4243

@@ -562,7 +563,7 @@ async def generate(
562563
llm_logger.info(f"Request {conn_request_id} generator exit (outer)")
563564
return
564565
except Exception as e:
565-
llm_logger.error(f"Request {conn_request_id} failed: {e}")
566+
log_request_error(message="Request {request_id} failed: {error}", request_id=conn_request_id, error=e)
566567
raise EngineError(str(e), error_code=500) from e
567568
finally:
568569
# Ensure request_map/request_num are cleaned up
@@ -584,7 +585,7 @@ async def abort_request(self, request_id: str) -> None:
584585
await self.connection_manager.cleanup_request(request_id)
585586
llm_logger.info(f"Aborted request {request_id}")
586587
except Exception as e:
587-
llm_logger.error(f"Failed to abort request {request_id}: {e}")
588+
log_request_error(message="Failed to abort request {request_id}: {error}", request_id=request_id, error=e)
588589

589590
async def shutdown(self):
590591
"""

fastdeploy/engine/engine.py

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,11 @@
4444
from fastdeploy.engine.expert_service import start_data_parallel_service
4545
from fastdeploy.engine.request import Request
4646
from fastdeploy.inter_communicator import EngineWorkerQueue, IPCSignal
47+
from fastdeploy.logger.request_logger import (
48+
RequestLogLevel,
49+
log_request,
50+
log_request_error,
51+
)
4752
from fastdeploy.metrics.metrics import main_process_metrics
4853
from fastdeploy.platforms import current_platform
4954
from fastdeploy.utils import EngineError, console_logger, envs, llm_logger
@@ -285,7 +290,7 @@ def add_requests(self, task, sampling_params=None, **kwargs):
285290
# Create Request struct after processing
286291
request = Request.from_dict(task)
287292
request.metrics.scheduler_recv_req_time = time.time()
288-
llm_logger.info(f"Receive request {request}")
293+
log_request(RequestLogLevel.CONTENT, message="Receive request {request}", request=request)
289294
request.metrics.preprocess_start_time = time.time()
290295

291296
request.prompt_token_ids_len = len(request.prompt_token_ids)
@@ -304,12 +309,20 @@ def add_requests(self, task, sampling_params=None, **kwargs):
304309
f"Input text is too long, length of prompt token({input_ids_len}) "
305310
f"+ min_dec_len ({min_tokens}) >= max_model_len "
306311
)
307-
llm_logger.error(error_msg)
312+
log_request_error(
313+
message="request[{request_id}] error: {error}",
314+
request_id=request.get("request_id"),
315+
error=error_msg,
316+
)
308317
raise EngineError(error_msg, error_code=400)
309318

310319
if input_ids_len > self.cfg.model_config.max_model_len:
311320
error_msg = f"Length of input token({input_ids_len}) exceeds the limit max_model_len({self.cfg.model_config.max_model_len})."
312-
llm_logger.error(error_msg)
321+
log_request_error(
322+
message="request[{request_id}] error: {error}",
323+
request_id=request.get("request_id"),
324+
error=error_msg,
325+
)
313326
raise EngineError(error_msg, error_code=400)
314327

315328
if request.get("stop_seqs_len") is not None:
@@ -320,7 +333,11 @@ def add_requests(self, task, sampling_params=None, **kwargs):
320333
f"Length of stop ({stop_seqs_len}) exceeds the limit max_stop_seqs_num({max_stop_seqs_num})."
321334
"Please reduce the number of stop or set a lager max_stop_seqs_num by `FD_MAX_STOP_SEQS_NUM`"
322335
)
323-
llm_logger.error(error_msg)
336+
log_request_error(
337+
message="request[{request_id}] error: {error}",
338+
request_id=request.get("request_id"),
339+
error=error_msg,
340+
)
324341
raise EngineError(error_msg, error_code=400)
325342
stop_seqs_max_len = envs.FD_STOP_SEQS_MAX_LEN
326343
for single_stop_seq_len in stop_seqs_len:
@@ -329,7 +346,11 @@ def add_requests(self, task, sampling_params=None, **kwargs):
329346
f"Length of stop_seqs({single_stop_seq_len}) exceeds the limit stop_seqs_max_len({stop_seqs_max_len})."
330347
"Please reduce the length of stop sequences or set a larger stop_seqs_max_len by `FD_STOP_SEQS_MAX_LEN`"
331348
)
332-
llm_logger.error(error_msg)
349+
log_request_error(
350+
message="request[{request_id}] error: {error}",
351+
request_id=request.get("request_id"),
352+
error=error_msg,
353+
)
333354
raise EngineError(error_msg, error_code=400)
334355

335356
if self._has_guided_input(request):
@@ -342,14 +363,22 @@ def add_requests(self, task, sampling_params=None, **kwargs):
342363
request, err_msg = self.guided_decoding_checker.schema_format(request)
343364

344365
if err_msg is not None:
345-
llm_logger.error(err_msg)
366+
log_request_error(
367+
message="request[{request_id}] error: {error}",
368+
request_id=request.get("request_id"),
369+
error=err_msg,
370+
)
346371
raise EngineError(err_msg, error_code=400)
347372

348373
request.metrics.preprocess_end_time = time.time()
349374
request.metrics.scheduler_recv_req_time = time.time()
350375
self.engine.scheduler.put_requests([request])
351-
llm_logger.info(f"Cache task with request_id ({request.get('request_id')})")
352-
llm_logger.debug(f"cache task: {request}")
376+
log_request(
377+
RequestLogLevel.STAGES,
378+
message="Cache task with request_id ({request_id})",
379+
request_id=request.get("request_id"),
380+
)
381+
log_request(RequestLogLevel.FULL, message="cache task: {request}", request=request)
353382

354383
def _worker_processes_ready(self):
355384
"""
@@ -717,11 +746,16 @@ def generate(self, prompts, stream):
717746
Yields:
718747
dict: The generated response.
719748
"""
720-
llm_logger.info(f"Starting generation for prompt: {prompts}")
749+
log_request(RequestLogLevel.CONTENT, message="Starting generation for prompt: {prompts}", prompts=prompts)
721750
try:
722751
req_id = self._format_and_add_data(prompts)
723752
except Exception as e:
724-
llm_logger.error(f"Error happened while adding request, details={e}, {str(traceback.format_exc())}")
753+
log_request_error(
754+
message="request[{request_id}] error while adding request: {error}, {traceback}",
755+
request_id=prompts.get("request_id"),
756+
error=str(e),
757+
traceback=traceback.format_exc(),
758+
)
725759
raise EngineError(str(e), error_code=400)
726760

727761
# Get the result of the current request
@@ -740,7 +774,7 @@ def generate(self, prompts, stream):
740774
output = self.engine.data_processor.process_response_dict(
741775
result.to_dict(), stream=False, include_stop_str_in_output=False, direct_decode=not stream
742776
)
743-
llm_logger.debug(f"Generate result: {output}")
777+
log_request(RequestLogLevel.FULL, message="Generate result: {output}", output=output)
744778
if not stream:
745779
yield output
746780
else:

fastdeploy/engine/request.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,11 @@
3939
StructuralTagResponseFormat,
4040
ToolCall,
4141
)
42-
from fastdeploy.utils import data_processor_logger
42+
from fastdeploy.logger.request_logger import (
43+
RequestLogLevel,
44+
log_request,
45+
log_request_error,
46+
)
4347
from fastdeploy.worker.output import (
4448
LogprobsLists,
4549
PromptLogprobs,
@@ -313,15 +317,13 @@ def from_generic_request(
313317
), "The parameter `raw_request` is not supported now, please use completion api instead."
314318
for key, value in req.metadata.items():
315319
setattr(request, key, value)
316-
from fastdeploy.utils import api_server_logger
317-
318-
api_server_logger.warning("The parameter metadata is obsolete.")
320+
log_request(RequestLogLevel.STAGES, message="The parameter metadata is obsolete.")
319321

320322
return request
321323

322324
@classmethod
323325
def from_dict(cls, d: dict):
324-
data_processor_logger.debug(f"{d}")
326+
log_request(RequestLogLevel.FULL, message="{request}", request=d)
325327
sampling_params: SamplingParams = None
326328
pooling_params: PoolingParams = None
327329
metrics: RequestMetrics = None
@@ -352,8 +354,11 @@ def from_dict(cls, d: dict):
352354
ImagePosition(**mm_pos) if not isinstance(mm_pos, ImagePosition) else mm_pos
353355
)
354356
except Exception as e:
355-
data_processor_logger.error(
356-
f"Convert mm_positions to ImagePosition error: {e}, {str(traceback.format_exc())}"
357+
log_request_error(
358+
message="request[{request_id}] Convert mm_positions to ImagePosition error: {error}, {traceback}",
359+
request_id=d.get("request_id"),
360+
error=str(e),
361+
traceback=traceback.format_exc(),
357362
)
358363
return cls(
359364
request_id=d["request_id"],

0 commit comments

Comments
 (0)