监控篇——部署AlertManager+企业微信告警
一、安装准备
以下基于场景prometheus环境
- 下载安装包
wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.1
9.0.linux-amd64.tar.gz
2. 部署安装包
tar -zvxf alertmanager-0.19.0.linux-amd64.tar.gz
cp -a alertmanager-0.19.0.linux-amd64/ /usr/local/alertmanager
二、部署
1.修改alertmanager.yml配置
cat /usr/local/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
templates: #告警模板
- './template/test.tmpl'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1m
receiver: 'wechat'
receivers:
- name: 'wechat'
wechat_configs:
- send_resolved: true
agent_id: '1000002' # 自建应用的agentId
to_user: 'LuZhanXing' # 接收告警消息的人员Id
api_secret: '' # 自建应用的secret
corp_id: 'ww4bcc83412351e94e' # 企业ID
#inhibit_rules:
#- source_match:
#severity: 'critical'
#target_match:
#severity: 'warning'
#equal: ['alertname', 'dev', 'instance']
2. 新建一个模板
mkdir -p /usr/local/alertmanager/template #新建一个目录
cat /usr/local/alertmanager/template/test.tmpl
{{ define "wechat.default.message" }}
{{ range .Alerts }}
========监控报警==========
告警状态:{{ .Status }}
告警级别:{{ .Labels.severity }}
告警类型:{{ .Labels.alertname }}
告警应用:{{ .Annotations.summary }}
告警主机:{{ .Labels.instance }}
告警详情:{{ .Annotations.description }}
触发阀值:{{ .Annotations.value }}
告警时间:{{ .StartsAt.Format "2006-01-02 15:04:05" }} ========end============= {{
end }} {{ end }}
3. 启动服务
nohup ./alertmanager &
三、配置Prometheus的配置
注意,因为已经再docker环境下部署prometheus时候挂在了配置目录,所以要重新删除容器重建容器
- 停止并删除容器
docker stop prometheus && docker rm promethues #执行
2. 单独部署Prometheus的yml配置
version: '2'
networks:
monitor:
driver: bridge
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /root/promethus/prometheus.yml:/etc/prometheus/prometheus.yml
- /root/promethus/rule.yml:/etc/prometheus/rule.yml
ports:
- "9090:9090"
networks:
- monitor
3. 在promethues.yml增加如下配置
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.1.81:9093']
# Load rules once and periodically evaluate them according to the global
'evaluation_interval'.
rule_files:
- "rule.yml"
# - "first_rules.yml"
# - "second_rules.yml"
4. 配置告警的规则
cat rule.yml
groups:
- name: server-rule
rules:
- alert: "内存告警"
expr: (node_memory_MemTotal_bytes -
(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes ))
/ node_memory_MemTotal_bytes * 100 > 90
for: 30s
labels:
severity: warning
annotations:
summary: "服务名:{{$labels.alertname}} 内存告警"
description: "{{ $labels.alertname }} 内存资源利用率大于 90%"
value: "{{ $value }}"
- alert: "CPU告警"
expr: 100 * (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[2m]))
by(instance)) > 50
for: 30s
labels:
severity: warning
annotations:
summary: "服务名:{{$labels.alertname}} CPU告警"
description: "{{ $labels.alertname }} CPU资源利用率大于 50%"
value: "{{ $value }}"
- alert: "磁盘告警"
expr: 100 * (node_filesystem_size_bytes{fstype=~"xfs|ext4"} -
node_filesystem_avail_bytes) / node_filesystem_size_bytes > 90
for: 30s
labels:
severity: warning
annotations:
summary: "服务名:{{$labels.alertname}} 磁盘告警"
description: "{{ $labels.alertname }} 磁盘利用率大于 90%"
value: "{{ $value }}"
5. 重建容器
nohup docker-compose up
6. 查看日志
docker logs promethues
四、在企业微信后台的设置
- 打开企业微信管理后台
2. 如下图



这里的AgentID和Secret是和alertmanager.yml的配置上同步的
五、告警效果

原创文章,作者:admin,如若转载,请注明出处:https://www.starz.top/2022/07/05/%e9%83%a8%e7%bd%b2alertmanager%e4%bc%81%e4%b8%9a%e5%be%ae%e4%bf%a1%e5%91%8a%e8%ad%a6/