1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197 | # 观测云中「任务调度」计数
[2024-03-06 20:58:05.084] [+0ms] 【使用额度】 数据查询范围为 1 分钟,尚未超过 15 分钟,不需要额外计量
[2024-03-06 20:58:05.084] [+0ms] 【使用额度】 存在`workspace_uuid`参数,值为 wksp_xxxxx ,需要计量 1 次
# 当前工作空间信息
[2024-03-06 20:58:05.086] [+1ms] 【Studio】 工作空间信息(从缓存获取):{"declaration":{"test":["value1","value2"],"test2":"value3","test3":"value4"},"isJobDisabled":false,"isSMSDisabled":false,"language":"zh","name":"【Doris】开发测试一起用_","token":"tkn_xxxxx"}
# 本次任务执行的函数 ID 及其参数列表
[2024-03-06 20:58:05.088] [+1ms] 【函数】 调用函数:guance__api_impl.custom_check
[2024-03-06 20:58:05.088] [+0ms] 【函数】 --> 参数:`checker`=`"custom_metric"`
[2024-03-06 20:58:05.088] [+0ms] 【函数】 --> 参数:`kwargs`=`{"version":"v2"}`
[2024-03-06 20:58:05.088] [+0ms] 【函数】 --> 参数:`targets`=`[{"alias":"Result","dql":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","queryType":"dql","range":60}]`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`channels`=`["chan_xxxxx"]`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`extra_data`=`{"type":"simpleCheck"}`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`checker_opt`=`{"id":"rul_xxxxx","infoEvent":false,"label":["xxxxx_test"],"message":"内容:xxxxx-监控器(单个){{df_dimension_tags}}\n第2行\n第3行","name":"标题:xxxxx-监控器(单个){{df_dimension_tags}}","noDataAction":"noData","noDataInterval":120,"noDataMessage":"","noDataTitle":"","recoverInterval":120,"rules":[{"conditionLogic":"and","conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"}],"title":"标题:xxxxx-监控器(单个){{df_dimension_tags}}"}`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`monitor_opt`=`{"id":"monitor_xxxxx","name":"default"}`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`workspace_uuid`=`"wksp_xxxxx"`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`workspace_token`=`"tkn_xxxxx"`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`disable_check_end_time`=`false`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`at_accounts`=`null`
[2024-03-06 20:58:05.089] [+0ms] 【函数】 --> 参数:`at_accounts_nodata`=`null`
# 监控器的频率配置
[2024-03-06 20:58:05.092] [+2ms] 【监控器】 根据实际 Crontab(*/1 * * * *)计算检测间隔
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 --> 本次触发时间:2024-03-06 20:57:00
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 --> 上次触发时间:2024-03-06 20:56:00
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 --> 检测间隔:60 秒
# 根据用户配置的无数据范围,查询最近无数据范围内数据和之前的两段时间范围数据(2 次 DQL)
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 ----------------- 加载断档 / 新增对象信息 ------------------
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 已配置 120 秒无数据范围
[2024-03-06 20:58:05.092] [+0ms] 【监控器】 查询上轮数据
[2024-03-06 20:58:05.092] [+0ms] 【KODO】 执行DQL查询
[2024-03-06 20:58:05.092] [+0ms] 【KODO】 --> 最多翻页:20 页
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 --> 时间范围:2024-03-06 20:51:00 ~ 2024-03-06 20:55:00
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 --> 第1页(soffset = 0 ~ 500)
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 调用KODO API
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 >> 请求:POST /v1/query
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 >>>> Body:{"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729460000,1709729700000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.093] [+0ms] 【KODO】 >> 首次请求
[2024-03-06 20:58:05.111] [+18ms] 【KODO】 >> 响应:`200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"4.172766ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729699000,56.4206008583691]]}],"window":0}]}`
[2024-03-06 20:58:05.113] [+1ms] 【Studio】 调用 Studio Inner API
[2024-03-06 20:58:05.113] [+0ms] 【Studio】 >> 请求:GET /api/v1/inner/metrics_units
[2024-03-06 20:58:05.113] [+0ms] 【Studio】 >>>> Query:{"metrics":"fake_data_for_test","workspaceUUID":"wksp_xxxxx"}`
[2024-03-06 20:58:05.113] [+0ms] 【Studio】 >> 首次请求
[2024-03-06 20:58:05.123] [+9ms] 【Studio】 >> 响应:`200 OK` => `{"code":200,"content":{},"errorCode":"","message":"","success":true,"traceId":"TRACE-3645139A-BC9D-48E7-A17F-3CC93C51E650"}`
[2024-03-06 20:58:05.124] [+1ms] 【Studio】 指标单位(从 API 获取):wksp_xxxxx/fake_data_for_test => `{"_DFF_CACHE_EXPIRE_TIME":1709730065}`
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 --> DQL 结果数据拆包:{"metric_units":{"field_int":null},"query_time_range":[1709729460000,1709729700000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:54:59Z",56.4206008583691]]}]}
[2024-03-06 20:58:05.125] [+0ms] 【监控器】 查询本轮数据
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 执行DQL查询
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 --> 最多翻页:20 页
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 --> 时间范围:2024-03-06 20:55:00 ~ 2024-03-06 20:57:00
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 --> 第1页(soffset = 0 ~ 500)
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 调用KODO API
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 >> 请求:POST /v1/query
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 >>>> Body:{"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729700000,1709729820000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.125] [+0ms] 【KODO】 >> 首次请求
[2024-03-06 20:58:05.144] [+18ms] 【KODO】 >> 响应:`200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"4.405232ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729819000,53.91111111111111]]}],"window":0}]}`
[2024-03-06 20:58:05.146] [+1ms] 【Studio】 指标单位(从缓存获取):wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1709730065}
[2024-03-06 20:58:05.146] [+0ms] 【KODO】 --> DQL 结果数据拆包:{"metric_units":{"field_int":null},"query_time_range":[1709729700000,1709729820000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:56:59Z",53.91111111111111]]}]}
# 根据两段时间范围内所查询的数据,判断数据是否存在断档,或者数据重新上报,并产生对应的【无数据事件】或【无数据恢复事件】
[2024-03-06 20:58:05.146] [+0ms] 【监控器】 ----------------- 断档 / 新增对象加载结果 ------------------
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 --> 上轮存在对象:{"tag":"fake-data-1"}
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 --> 本轮存在对象:{"tag":"fake-data-1"}
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 ----> 数据断档对象(上轮存在 -> 本轮不存在):无
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 --------------------- 判断数据断档 ---------------------
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 --> 没有数据断档对象
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 ------------------- 判断数据从断档恢复 --------------------
[2024-03-06 20:58:05.147] [+0ms] 【监控器】 --> 对象:{"tag":"fake-data-1"}
[2024-03-06 20:58:05.151] [+4ms] 【监控器】 对象 {"tag":"fake-data-1"} 的故障周期信息(fault_info):{"date":1709729160,"faultDuration":720,"faultId":"event-xxxxx","faultStartTime":1709728440,"status":"ok"}
[2024-03-06 20:58:05.151] [+0ms] 【监控器】 ----> 上次无数据事件为无数据恢复,不存在活跃无数据事件
[2024-03-06 20:58:05.151] [+0ms] 【监控器】 ----> 没有活跃无数据事件,不需要产生无数据恢复事件
# 根据用户配置的检测规则判断是否产生【告警事件】
[2024-03-06 20:58:05.151] [+0ms] 【监控器】 -------------------- 执行数据数值检测 --------------------
[2024-03-06 20:58:05.151] [+0ms] 【监控器】 查询待检测数据
[2024-03-06 20:58:05.151] [+0ms] 【KODO】 执行DQL查询
[2024-03-06 20:58:05.151] [+0ms] 【KODO】 --> 最多翻页:20 页
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 --> 时间范围:2024-03-06 20:56:00 ~ 2024-03-06 20:57:00
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 --> 第1页(soffset = 0 ~ 500)
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 调用KODO API
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 >> 请求:POST /v1/query
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 >>>> Body:{"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729760000,1709729820000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.152] [+0ms] 【KODO】 >> 首次请求
[2024-03-06 20:58:05.162] [+9ms] 【KODO】 >> 响应:`200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"3.042875ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729819000,54.90555555555556]]}],"window":0}]}`
[2024-03-06 20:58:05.163] [+1ms] 【Studio】 指标单位(从缓存获取):wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1709730065}
[2024-03-06 20:58:05.163] [+0ms] 【KODO】 --> DQL 结果数据拆包:{"metric_units":{"field_int":null},"query_time_range":[1709729760000,1709729820000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:56:59Z",54.90555555555556]]}]}
# 依次遍历所有检测对象,依次执行检测
[2024-03-06 20:58:05.164] [+0ms] 【监控器】 检测对象:共 1 个
[2024-03-06 20:58:05.164] [+0ms] 【监控器】 [检测对象 1/1] {"tag":"fake-data-1"}
[2024-03-06 20:58:05.165] [+1ms] 【通用阈值检测】 待检测数据:{'Result': [54.90555555555556]}
# 依次遍历所有配置规则,判断命中的检测规则
[2024-03-06 20:58:05.166] [+0ms] 【通用阈值检测】 阈值规则:共 1 条
[2024-03-06 20:58:05.166] [+0ms] 【通用阈值检测】 [阈值规则 1/1] critical:Result >= ['0']
[2024-03-06 20:58:05.166] [+0ms] 【条件判断】 [条件 1/1] IF Result (ANY[54.90555555555556]) >= ["0"]
[2024-03-06 20:58:05.166] [+0ms] 【条件判断】 --> 中间结果为 True ,条件关系为 AND ,继续
[2024-03-06 20:58:05.166] [+0ms] 【通用阈值检测】 --> 匹配成功,结束判断
[2024-03-06 20:58:05.166] [+0ms] 【通用阈值检测】 阈值规则匹配结果:{"check_data":{"Result":54.90555555555556},"conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"}
[2024-03-06 20:58:05.166] [+0ms] 【监控器】 --> 检测对象:{"tag":"fake-data-1"}:已达到故障条件
# 调用观测云 Studio 获取本监控器所配置的告警策略
[2024-03-06 20:58:05.169] [+2ms] 【Studio】 告警信息缓存已禁用
[2024-03-06 20:58:05.169] [+0ms] 【Studio】 调用 Studio Inner API
[2024-03-06 20:58:05.169] [+0ms] 【Studio】 >> 请求:GET /api/v1/inner/alert_opt/get
[2024-03-06 20:58:05.169] [+0ms] 【Studio】 >>>> Query:{"checkerUUID":"rul_xxxxx","workspaceUUID":"wksp_xxxxx"}`
[2024-03-06 20:58:05.169] [+0ms] 【Studio】 >> 首次请求
[2024-03-06 20:58:05.177] [+7ms] 【Studio】 >> 响应:`200 OK` => `{"code":200,"content":{"data":{"alertPolicies":[{"aggClusterFields":[],"aggFields":[],"aggInterval":0,"aggLabels":[],"id":8222,"minInterval":900,"name":"xxxxx-告警策略1","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"自定义通知配置1","targets":[{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-告警策略1-规则1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]},{"targets":[{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-告警策略1-规则2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"},{"aggClusterFields":["df_title"],"aggFields":["CLUSTER"],"aggInterval":60,"aggLabels":[],"id":8223,"minInterval":900,"name":"xxxxx-告警策略2","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"自定义通知配置1","targets":[{"name":"xxxxx-告警策略2-规则1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]},{"targets":[{"name":"xxxxx-告警策略2-规则2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"}],"silent":[]}},"errorCode":"","message":"","success":true,"traceId":"TRACE-XXXXX"}`
[2024-03-06 20:58:05.178] [+1ms] 【Studio】 告警配置(从 API 获取):rul_xxxxx => `{"_DFF_CACHE_EXPIRE_TIME":1709730065,"alertPolicies":[{"aggClusterFields":[],"aggFields":[],"aggInterval":0,"aggLabels":[],"id":8222,"minInterval":900,"name":"xxxxx-告警策略1","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"自定义通知配置1","targets":[{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-告警策略1-规则1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]},{"targets":[{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-告警策略1-规则2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"},{"aggClusterFields":["df_title"],"aggFields":["CLUSTER"],"aggInterval":60,"aggLabels":[],"id":8223,"minInterval":900,"name":"xxxxx-告警策略2","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"自定义通知配置1","targets":[{"name":"xxxxx-告警策略2-规则1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]},{"targets":[{"name":"xxxxx-告警策略2-规则2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-微信","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"}],"silent":[]}`
[2024-03-06 20:58:05.180] [+1ms] 【Studio】 常量配置(从缓存获取):envName => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":"测试环境"}
[2024-03-06 20:58:05.183] [+2ms] 【Studio】 常量配置(从缓存获取):UsePublicAlertLink => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":false}
[2024-03-06 20:58:05.184] [+1ms] 【Studio】 常量配置(从缓存获取):consoleBaseURL => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":"http://testing.domain.com"}
# 根据用户配置的告警模板和事件数据,渲染事件标题 / 内容
[2024-03-06 20:58:05.186] [+1ms] 【文本渲染器】 渲染模板:
内容:xxxxx-监控器(单个){{df_dimension_tags}}
第2行
第3行
[2024-03-06 20:58:05.187] [+1ms] 【文本渲染器】 --> 渲染成功。输出:
内容:xxxxx-监控器(单个){"tag":"fake-data-1"}
第2行
第3行
[2024-03-06 20:58:05.191] [+4ms] 【文本渲染器】 渲染模板:
标题:xxxxx-监控器(单个){{df_dimension_tags}}
[2024-03-06 20:58:05.192] [+0ms] 【文本渲染器】 --> 渲染成功。输出:
标题:xxxxx-监控器(单个){"tag":"fake-data-1"}
# 依次遍历事件 / 静默规则,判断每个事件是否需要静默
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 [事件 1/1] <critical事件@monitor:{"tag":"fake-data-1"}:标题:xxxxx-监控器(单个){"tag":"fake-data-1"}>
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 没有静默规则,不需要静默
# 依次遍历告警策略,判断事件命中哪个告警策略 / 告警规则
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 告警策略:共 2 条
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 [告警策略 1/2] xxxxx-告警策略1(8222)
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 --------------------- 发送事件告警 ---------------------
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 告警规则:共 2 条
[2024-03-06 20:58:05.192] [+0ms] 【事件告警器】 [告警规则 1/2] 按 Crontab `00 09 * * *` 循环,每轮循环持续 39600 秒
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 --> 已配置重复时间段,但不在重复时间段范围内
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 --> 不满足重复时间段告警,跳过
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 [告警规则 2/2] 剩余其他时间段
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 成功匹配告警规则,需要告警
# 依次遍历通知对象,判断是否处于沉默期内
#(同一个告警策略 / 规则下的所有告警通知对象会进行沉默期对齐)
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 告警通知对象:共 2 条
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 [告警通知对象 1/2] dingTalkRobot/xxxxx-告警策略1-规则2 (critical)
[2024-03-06 20:58:05.193] [+0ms] 【事件告警器】 检查事件 status 匹配动作:`critical` => critical
[2024-03-06 20:58:05.195] [+1ms] 【事件告警器】 --> 上次告警于 2024-03-06 20:46:00,沉默 900 秒。沉默期于 2024-03-06 21:01:00(240 秒以后)解除
[2024-03-06 20:58:05.195] [+0ms] 【事件告警器】 ----> 当前处于沉默期,跳过
[2024-03-06 20:58:05.195] [+0ms] 【事件告警器】 [告警通知对象 2/2] wechatRobot/xxxxx-微信 (critical)
[2024-03-06 20:58:05.195] [+0ms] 【事件告警器】 检查事件 status 匹配动作:`critical` => critical
[2024-03-06 20:58:05.197] [+1ms] 【事件告警器】 --> 上次告警于 2024-03-06 20:46:00,沉默 900 秒。沉默期于 2024-03-06 21:01:00(240 秒以后)解除
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 ----> 当前处于沉默期,跳过
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 [告警策略 2/2] xxxxx-告警策略2(8223)
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 --------------------- 发送事件告警 ---------------------
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 告警规则:共 2 条
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 [告警规则 1/2] 按 Crontab `00 09 * * *` 循环,每轮循环持续 39600 秒
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 --> 已配置重复时间段,但不在重复时间段范围内
[2024-03-06 20:58:05.197] [+0ms] 【事件告警器】 --> 不满足重复时间段告警,跳过
[2024-03-06 20:58:05.198] [+0ms] 【事件告警器】 [告警规则 2/2] 剩余其他时间段
[2024-03-06 20:58:05.198] [+0ms] 【事件告警器】 成功匹配告警规则,需要告警
[2024-03-06 20:58:05.198] [+0ms] 【事件告警器】 告警通知对象:共 2 条
[2024-03-06 20:58:05.198] [+0ms] 【事件告警器】 [告警通知对象 1/2] dingTalkRobot/xxxxx-告警策略2-规则2 (critical)
[2024-03-06 20:58:05.198] [+0ms] 【事件告警器】 检查事件 status 匹配动作:`critical` => critical
[2024-03-06 20:58:05.199] [+1ms] 【事件告警器】 --> 上次告警于 2024-03-06 20:46:00,沉默 900 秒。沉默期于 2024-03-06 21:01:00(240 秒以后)解除
[2024-03-06 20:58:05.199] [+0ms] 【事件告警器】 ----> 当前处于沉默期,跳过
[2024-03-06 20:58:05.200] [+0ms] 【事件告警器】 [告警通知对象 2/2] wechatRobot/xxxxx-微信 (critical)
[2024-03-06 20:58:05.200] [+0ms] 【事件告警器】 检查事件 status 匹配动作:`critical` => critical
[2024-03-06 20:58:05.201] [+1ms] 【事件告警器】 --> 上次告警于 2024-03-06 20:46:00,沉默 900 秒。沉默期于 2024-03-06 21:01:00(240 秒以后)解除
[2024-03-06 20:58:05.201] [+0ms] 【事件告警器】 ----> 当前处于沉默期,跳过
# 已生成事件建立缓存,供下一次监控器任务使用
[2024-03-06 20:58:05.204] [+2ms] 【内部DataWay】 缓存事件
[2024-03-06 20:58:05.204] [+0ms] 【内部DataWay】 缓存故障信息
[2024-03-06 20:58:05.204] [+0ms] 【内部DataWay】 --> 建立缓存:key=`rul_xxxxx-check`, field=`{"tag":"fake-data-1"}`
# 事件写入观测云
[2024-03-06 20:58:05.206] [+2ms] 【内部DataWay】 写入事件
[2024-03-06 20:58:05.208] [+1ms] 【内部DataWay】 行协议方式写入数据
[2024-03-06 20:58:05.208] [+0ms] 【内部DataWay】 --> 工作空间TOKEN:`tkn_xxxxx`
[2024-03-06 20:58:05.208] [+0ms] 【内部DataWay】 --> 请求:POST /v1/write/keyevent
[2024-03-06 20:58:05.208] [+0ms] 【内部DataWay】 --> 前 1/1 条数据示例:[{"fields":{"df_alert_policy_ids":["altpl_xxxxx","altpl_xxxxx"],"df_alert_policy_names":["xxxxx-告警策略1","xxxxx-告警策略2"],"df_at_accounts":"[]","df_at_accounts_nodata":"[]","df_channels":"[\"chan_xxxxx\"]","df_check_range_end":1709729820,"df_check_range_start":1709729760,"df_date_range":60,"df_dimension_tags":"{\"tag\":\"fake-data-1\"}","df_event_reason":"满足监控器中故障的认定条件,产生故障事件","df_fault_duration":2880,"df_fault_start_time":1709726940,"df_issue_duration":2880,"df_issue_start_time":1709726940,"df_matched_alert_policy_rules":["xxxxx-告警策略1 / -","xxxxx-告警策略2 / -"],"df_message":"内容:xxxxx-监控器(单个){\"tag\":\"fake-data-1\"} \n第2行 \n第3行","df_meta":"{\"alert_info\":{\"matchedAlertPolicyRules\":[{\"aggClusterFields\":[],\"aggFields\":[],\"aggInterval\":0,\"aggLabels\":[],\"id\":8222,\"minInterval\":900,\"name\":\"xxxxx-告警策略1\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-告警策略1-规则2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-微信\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"},{\"aggClusterFields\":[\"df_title\"],\"aggFields\":[\"CLUSTER\"],\"aggInterval\":60,\"aggLabels\":[],\"id\":8223,\"minInterval\":900,\"name\":\"xxxxx-告警策略2\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-告警策略2-规则2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-微信\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"}],\"matchedSilentRule\":null,\"targets\":[{\"hasSecret\":false,\"i... <Length: 6784>
[2024-03-06 20:58:05.209] [+0ms] 【内部DataWay】 --> 首条数据行协议示例:`keyevent,df_crontab_exec_mode=crontab,df_event_id=event-xxxxx,df_fault_id=event-xxxxx,df_fault_status=fault,df_label=["xxxxx_test"],df_language=zh,df_monitor_checker=custom_metric,df_monitor_checker_event_ref=xxxxx,df_monitor_checker_id=rul_xxxxx,df_monitor_checker_ref=xxxxx,df_monitor_checker_sub=check,df_monitor_checker_type=monitor,df_monitor_id=altpl_xxxxx;altpl_xxxxx,df_monitor_type=custom,df_site_name=测试环境,df_source=monitor,df_status=critical,df_sub_status=critical,df_workspace_name=【Doris】开发测试一起用_,df_workspace_uuid=wksp_xxxxx,tag=fake-data-1 df_alert_policy_ids=["altpl_xxxxx","altpl_xxxxx"],df_alert_policy_names=["xxxxx-告警策略1","xxxxx-告警策略2"],df_at_accounts="[]",df_at_accounts_nodata="[]",df_channels="[\"chan_xxxxx\"]",df_check_range_end=1709729820i,df_check_range_start=1709729760i,df_date_range=60i,df_dimension_tags="{\"tag\":\"fake-data-1\"}",df_event_reason="满足监控器中故障的认定条件,产生故障事件",df_fault_duration=2880i,df_fault_start_time=1709726940i,df_issue_duration=2880i,df_issue_start_time=1709726940i,df_matched_alert_policy_rules=["xxxxx-告警策略1 / -","xxxxx-告警策略2 / -"],df_message="内容:xxxxx-监控器(单个){\"tag\":\"fake-data-1\"}
第2行
第3行",df_meta="{\"alert_info\":{\"matchedAlertPolicyRules\":[{\"aggClusterFields\":[],\"aggFields\":[],\"aggInterval\":0,\"aggLabels\":[],\"id\":8222,\"minInterval\":900,\"name\":\"xxxxx-告警策略1\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-告警策略1-规则2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-微信\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"},{\"aggClusterFields\":[\"df_title\"],\"aggFields\":[\"CLUSTER\"],\"aggInterval\":60,\"aggLabels\":[],\"id\":8223,\"minInterval\":900,\"name\":\"xxxxx-告警策略2\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"当前处于沉默期。上次告警于 2024-03-06 20:46:00(沉默 900 秒),沉默期将于 2024-03-06 21:01:00(240 秒以后)结束\",\"isIgnored\":true,\"name\":\"xxxxx-告警策略2-规则2\",\"... <Length: 6616>
[2024-03-06 20:58:05.216] [+6ms] 【内部DataWay】 --> 响应结果:`200 OK`
[2024-03-06 20:58:05.216] [+0ms] 【内部DataWay】 --> 响应内容:""
# 根据用户配置,将已生成事件通知给观测云 Studio,用于追踪
#(此处监控器仅通知,异常追踪的具体业务由观测云 Studio 实现)
[2024-03-06 20:58:05.216] [+0ms] 【Studio】 缓冲需要通知 Studio 的事件
[2024-03-06 20:58:05.220] [+4ms] 【Studio】 --> 事件:`{"df_at_accounts":[],"df_at_accounts_nodata":[],"df_channels":["chan_xxxxx"],"df_check_range_end":1709729820,"df_check_range_start":1709729760,"df_crontab_exec_mode":"crontab","df_date_range":60,"df_dimension_tags":"{\"tag\":\"fake-data-1\"}","df_event_id":"event-xxxxx","df_fault_duration":2880,"df_fault_id":"event-xxxxx","df_fault_start_time":1709726940,"df_fault_status":"fault","df_label":"[\"xxxxx_test\"]","df_message":"内容:xxxxx-监控器(单个){\"tag\":\"fake-data-1\"} \n第2行 \n第3行","df_monitor_checker":"custom_metric","df_monitor_checker_event_ref":"xxxxx","df_monitor_checker_id":"rul_xxxxx","df_monitor_checker_name":"标题:xxxxx-监控器(单个){{df_dimension_tags}}","df_monitor_checker_ref":"xxxxx","df_monitor_checker_sub":"check","df_monitor_checker_type":"monitor","df_monitor_checker_value":"54.90555555555556","df_monitor_id":"altpl_xxxxx;altpl_xxxxx","df_monitor_name":"xxxxx-告警策略1;xxxxx-告警策略2","df_monitor_type":"custom","df_site_name":"测试环境","df_source":"monitor","df_status":"critical","df_sub_status":"critical","df_title":"标题:xxxxx-监控器(单个){\"tag\":\"fake-data-1\"}","df_workspace_uuid":"wksp_xxxxx","timestamp":1709729820}`
# 本次检测产生监控器数量
[2024-03-06 20:58:05.221] [+1ms] 本次检测共产生 1 个监控器事件
|