跳到主要内容
版本:1.x

monitoring

#Guardian

Guardian is a sub-project of seatunnel. It is a monitoring and alarming tool that can provide monitoring of seatunnel's survival and scheduling delay. Guardian is capable of dynamically loading configuration files at runtime and provides an HTTP API to support real-time modification of configuration. Currently only seatunnel on Yarn is supported.

run Guardian

Download Guardian, take guardian_1.0.0 as an example

wget https://github.com/InterestingLab/guardian/releases/download/v1.0.0/guardian_1.0.0.tar.gz
tar -xvf guardian_1.0.0
cd guardian_1.0.0
./bin/guardian check config.json

config file

Guardian configuration files are written in JSON format, a valid example, click here

The entire configuration file consists of the following parts:

  • port: the port to which the interface API is bound
  • node_name: node information
  • check_interval: the time interval for checking the application
  • yarn: the detected YARN cluster address
  • apps: specific apps that need to be detected
  • alert_manager: alert management

The following is a detailed description of each part:

yarn

# Yarn resourcemanager
api_hosts: <list>

Example

"yarn": {
"api_hosts": [
"10.11.10.21:8088",
"10.11.10.22:8088"
]
}

apps

[{
# Spark application name
"app_name": <string>,
# Restart command when application fails
"start_cmd": <string>,
# The number of applications running under the same app_name
"app_num": <number>,
# Application type, default 'spark'
"check_type": <string>,
# mark whether the application is valid or not
"active": <boolean>
"check_options": {
# Alarm level, support WARNNING, ERROR, etc.
"alert_level": <string>,
"max_delayed_batch_num": <number>,
"max_delayed_time": <number>
}
}]

Example

"apps": [
{
"app_name": "seatunnel-app",
"start_cmd": "test_cmd",
"app_num": 1,
"check_type": "spark",
"check_options": {
"alert_level": "WARNING",
"max_delayed_batch_num": 10,
"max_delayed_time": 600
}
}
]

alert_manager

routes

Alarm routing, currently only supports alarm levels

Trigger an alarm when the alarm level is WARNNING or ERROR

"routes": {
"match": {
"level": ["WARNING", "ERROR"]
}
}

emails

Send alarm information by email

# Email verification username
"auth_username": <string>,
# Email verification password
"auth_password": <string>,
# Mailbox stmp server
"smtp_server": <string>,
# sender
"sender": <string>,
# recipient list
"receivers": <list>

Example

"emails": {
"auth_username": "username",
"auth_password": "password",
"smtp_server": "smtp.163.com",
"sender": "huochen1994@163.com",
"receivers": ["garygaowork@gmail.com"],
"routes": {
"match": {
"level": ["WARNING", "ERROR"]
}
}
}

webhooks

Implement custom alarm mode through interface

# webhook interface address
"url": <string>

Example

"webhook": {
"url": "http://api.webhook.interestinglab.org/alert",
"routes": {
"match": {
"level": ["ERROR"]
}
}
}

When Gaurdian calls the interface, it will send an HTTP POST request to the configured interface address in the following JSON format:

{
"subject": "Guardian",
"objects": "seatunnel_app",
"content": "App is not running or less than expected number of running instance, will restart"
}

Guardian interface usage guide

GET

Overview

  • Function description

    Get the configuration information of Guardian corresponding to app_name

  • Basic interface

    http://localhost:5000/config/[app_name]

  • Request method

    get

Interface parameter definition

N/A

return result

curl 'http://localhost:5000/config/seatunnel-app2'

{
"content": {
"app_name": "seatunnel-app2",
"app_num": 1,
"check_options": {},
"check_type": "spark",
"start_cmd": "test_cmd_not_exist"
},
"status": 0
}

POST

Overview

  • Function description

    Update or add application configuration information in Guardian. When app_name exists, update the corresponding configuration information. When app_name does not exist, add an application monitoring configuration

  • Basic interface

    http://localhost:5000/config/[app_name]

  • Request method

    post

Interface parameter definition

FieldTypeCommentInstance
start_cmdstringrestart command
app_numnumExisting number2
check_typestringApplication typespark
check_optionsdict
activebooleanis activetrue

return result

`
curl 'http://localhost:5000/config/seatunnel-app2' -d '
{
'active': false
}'

{
"status": 0
}

DELETE

Overview

  • Function description

    Delete the configuration information of Guardian corresponding to app_name

  • Basic interface

    http://localhost:5000/config/[app_name]

  • Request method

    delete

Interface parameter definition

N/A

return result

curl -XDELETE 10.212.81.56:5000/config/seatunnel-app2

{
"status": 0
}

Return status code description

statusDescription
0Success
1Parameter error
2Internal error