Skip to content

Commit

Permalink
Merge pull request #32 from shunfei/prometheus
Browse files Browse the repository at this point in the history
Prometheus
  • Loading branch information
flowbehappy committed Sep 29, 2015
2 parents c89a82f + 6914227 commit fa0bcb6
Show file tree
Hide file tree
Showing 27 changed files with 630 additions and 692 deletions.
80 changes: 27 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ DCMonitor

A simple, lightweight Data Center monitor, currently includes Zookeeper, Kafka, [Druid](http://druid.io/)(in progress). Motivated by [KafkaOffsetMonitor](https://github.com/quantifind/KafkaOffsetMonitor), but faster and more stable.

It is written in java, and use [InfluxDB v0.9](https://github.com/influxdb/influxdb) as historical metrics storage.
It is written in java, and use [Prometheus](http://prometheus.io/) as historical metrics storage.

### NOTE

##License

Everyone should check this [issue](https://github.com/shunfei/DCMonitor/issues/27) before deploying DCMonitor.
[The MIT License (MIT)](https://opensource.org/licenses/MIT)

###Zookeeper monitor

Expand All @@ -27,60 +28,35 @@ Everyone should check this [issue](https://github.com/shunfei/DCMonitor/issues/2
##Dependences

* Run
* java(1.6 or later)
* java(1.7 or later)
* [Prometheus](http://prometheus.io/)
* Compile
* maven
* java(1.7 or later)

##Installation

* Set up your Zookeeper, Kafka, Druid(If you have) for monitoring.
* Set up [InfluxDB 0.9 Stable](https://influxdb.com/docs/v0.9/introduction/installation.html).

* Download and Install InfluxDB.
* Set up [Prometheus/](http://prometheus.io/).
* Download a Prometheus release from [https://github.com/prometheus/prometheus/releases](https://github.com/prometheus/prometheus/releases) and set it up following [http://prometheus.io/docs/introduction/getting_started/](http://prometheus.io/docs/introduction/getting_started/), you can stop before [here](http://prometheus.io/docs/introduction/getting_started/#using-the-graphing-interface) if you don't want to go deep into prometheus. And don't worry, it is extremely easy.
* Add a job to scrape DCMonitor's metrics, job config should looks like:

* Configure InfluxDB
Two ways to create a database for metrics storing.
* You can choose to create a database for DCMonitor by youself, can be done by sending HTTP POST requests like this:

```
curl -G 'http://192.168.10.51:8086/query?u=root&p=root' --data-urlencode "q=CREATE database dcmonitor"
curl -G 'http://192.168.10.51:8086/query?u=root&p=root&db=dcmonitor' --data-urlencode "q=CREATE RETENTION POLICY seven_days ON dcmonitor DURATION 168h REPLICATION 1 DEFAULT"
```

here `192.168.10.51:8086` is where InfluxDB installed, `dcmonitor` is the database you configured in config.json, and `168h` shows we only keep the last 7 days historical metrics. Check [here](https://influxdb.com/docs/v0.9/query_language/database_administration.html) for detail.
* Or you don't have to do anyting, leave DCMonitor do this for you. DCMonitor will automatically create a database with seven days retention policy if it doesn't exits. Node that you still can change the retention policy later by
```
ALTER RETENTION POLICY seven_days ON dcmonitor DURATION 2d
```
or create another new default one to replace the old default:
```
CREATE RETENTION POLICY two_days ON dcmonitor DURATION 2d REPLICATION 1 DEFAULT
```

DCMonitor will choose the default policy to ingest metrics.
```
- job_name: 'dcmonitor'
scrape_interval: 5s
scrape_timeout: 10s
target_groups:
- targets: ['localhost:8075']
```
Here `localhost:8075` is the DCMonitor's host:port which web service listen on (configured in `application.properties`). The completed example is [here](https://github.com/shunfei/DCMonitor/blob/master/config/prometheus.yml).

After that go to `http://<hostname>:9090/status`, expected to see the dcmonitor endpoints in targets section. It is in `UNHEALTHY` state because we havn't set up DCMonitor web service yet!

![](img/prometheus_status.png)

* Compile & deploy DCMonitor

* Currently [influxdb-java](https://github.com/influxdb/influxdb-java) haven't been published to maven host yet, you have to compile & install it to your maven local repository.

If you are using java 1.7, you probably have to remove the test code, otherwise may cause [issue](https://github.com/influxdb/influxdb-java/issues/37)
```
git@github.com:influxdb/influxdb-java.git
cd influxdb-java
rm -rf src/test
mvn clean install
```
* Compile DCMonitor
* Compile

```
git clone git@github.com:shunfei/DCMonitor.git
Expand All @@ -89,12 +65,10 @@ Everyone should check this [issue](https://github.com/shunfei/DCMonitor/issues/2
```
Then a `target` folder will be generated under root folder.

* deploy
* Deploy

You only need to deploy `target`, `run.sh`, `config` to target machine.
Modify configurations in `config/config.json`.
Modify configurations in `config/config.json` and `application.properties`.
Run `run.sh`, if every thing is fine, visit `http://hostname:8075` to enjoy!


Run `run.sh`, we start the DCMonitor web service, if every thing is fine, visit `http://<hostname>:8075` to enjoy!
4 changes: 2 additions & 2 deletions config/application.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
server.port:8075
server.tomcat.access-log-enabled: true
server.address: localhost
#server.address: localhost The host name which server listen on.
server.tomcat.max-threads: 10

spring.velocity.charset: UTF-8
Expand All @@ -9,7 +9,7 @@ spring.velocity.charset: UTF-8
kafka.query.flushInterval: 10000
kafka.query.time.offset: PT15m


druid.query.time.offset: PT15m

#time
#offsetHour, eg beijing is 8 hours offset
Expand Down
11 changes: 5 additions & 6 deletions config/config.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
{
"influxdb": {
"influxdbUrl": "http://192.168.10.51:8086",
"influxdbDatabase": "dcmonitor",
"influxdbUser": "root",
"influxdbPassword": "root"
"prometheus":{
"namespace": "dcmonitor",
"@serverUrl": "The host:port of the prometheus server",
"serverUrl": "http://localhost:9090"
},
"zookeeper": {
"addrs": "192.168.10.51:2181,192.168.10.41:2181,192.168.10.42:2181",
Expand Down Expand Up @@ -33,7 +32,7 @@
"warnLagSpec": {
"test|dsp_druid_ingester_0": 200
},
"@comment": "set ignoreConsumerRegex to ignore sending warning on those test consumers",
"@ignoreConsumerRegex": "set ignoreConsumerRegex to ignore sending warning on those test consumers",
"ignoreConsumerRegex": "^console-consumer-.+$",
"stormKafkaRoot": "/storm_kafka"
},
Expand Down
13 changes: 13 additions & 0 deletions config/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# This is an example config of prometheus.

global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, scrape targets every 15 seconds.
labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'dcmonitor'
scrape_interval: 5s
scrape_timeout: 10s
target_groups:
- targets: ['localhost:8075']
Binary file added img/prometheus_status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 43 additions & 22 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
<artifactId>dcmonitor</artifactId>
<version>1.0</version>


<dependencies>
<dependency>
<groupId>org.apache.curator</groupId>
Expand Down Expand Up @@ -64,7 +63,6 @@
<artifactId>retrofit</artifactId>
<version>1.8.0</version>
</dependency>
<!-- If we use okhttp instead of java urlconnection we achieve server failover of the influxdb server address resolves to all influxdb server ips.-->
<dependency>
<groupId>com.squareup.okhttp</groupId>
<artifactId>okhttp</artifactId>
Expand Down Expand Up @@ -110,39 +108,33 @@
</exclusions>
</dependency>

<dependency>
<groupId>org.influxdb</groupId>
<artifactId>influxdb-java</artifactId>
<version>2.0-SNAPSHOT</version>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>1.0.2.RELEASE</version>

<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
</exclusion>

<!--<exclusion>-->
<!--<groupId>org.slf4j</groupId>-->
<!--<artifactId>jcl-over-slf4j</artifactId>-->
<!--</exclusion>-->
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</exclusion>

<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>

<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>

</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context-support</artifactId>
<version>4.0.5.RELEASE</version>
</dependency>

<dependency>
Expand All @@ -151,23 +143,41 @@
<version>1.2</version>
</dependency>


<dependency>
<groupId>org.apache.velocity</groupId>
<artifactId>velocity</artifactId>
<version>1.7</version>
</dependency>

<!-- The client -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context-support</artifactId>
<version>4.0.5.RELEASE</version>
<groupId>io.prometheus</groupId>
<artifactId>client</artifactId>
<version>0.0.10</version>
</dependency>
<!-- Hotspot 'jvmstat/perfdata' metrics -->
<dependency>
<groupId>io.prometheus.client.utility</groupId>
<artifactId>jvmstat</artifactId>
<version>0.0.10</version>
</dependency>
<!-- Exposition servlet -->
<dependency>
<groupId>io.prometheus.client.utility</groupId>
<artifactId>servlet</artifactId>
<version>0.0.10</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
</exclusion>
</exclusions>
</dependency>

</dependencies>

<build>
<plugins>

<plugin>
<groupId>com.jolira</groupId>
<artifactId>onejar-maven-plugin</artifactId>
Expand All @@ -181,6 +191,17 @@
</executions>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>

</plugins>


</build>
</project>
8 changes: 4 additions & 4 deletions src/main/java/com/sf/monitor/CommonFetcher.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package com.sf.monitor;

import com.fasterxml.jackson.annotation.JsonProperty;
import com.sf.monitor.influxdb.Event;
import com.sf.monitor.influxdb.InfluxDBUtils;
import com.sf.monitor.kafka.KafkaStats;
import com.sf.monitor.utils.PrometheusUtils;

import java.util.List;

Expand All @@ -11,8 +11,8 @@ public abstract class CommonFetcher implements InfoFetcher {
public Boolean saveMetrics;

public void saveMetrics(List<Event> events) {
if (saveMetrics == null || saveMetrics){
InfluxDBUtils.saveEvents(events);
if (saveMetrics == null || saveMetrics) {
PrometheusUtils.saveEvents(KafkaStats.tableName, events);
}
}
}
16 changes: 6 additions & 10 deletions src/main/java/com/sf/monitor/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ public class Config {
private static final Logger log = new Logger(Config.class);

@JsonProperty
public InfluxdbConfig influxdb;
public Prometheus prometheus;
@JsonProperty
public CuratorConfig zookeeper;
@JsonProperty("druid")
Expand All @@ -37,15 +37,11 @@ public List<InfoFetcher> fetcherList() {
return ImmutableList.<InfoFetcher>of(fetchers.druidFetcher, fetchers.kafkaFetcher, fetchers.zookeeperFetcher);
}

public static class InfluxdbConfig {
public static class Prometheus {
@JsonProperty
public String influxdbUrl;
public String namespace;
@JsonProperty
public String influxdbDatabase;
@JsonProperty
public String influxdbUser;
@JsonProperty
public String influxdbPassword;
public String serverUrl;
}

public static class CuratorConfig {
Expand Down Expand Up @@ -97,7 +93,7 @@ public KafkaConfig(
this.warning = warning;
this.warnDefaultLag = warnDefaultLag;
this.warnLagSpec = warnLagSpec;
if (ignoreConsumerRegex != null){
if (ignoreConsumerRegex != null) {
this.ignoreConsumerRegex = Pattern.compile(ignoreConsumerRegex);
}
this.stormKafkaRoot = stormKafkaRoot;
Expand All @@ -121,7 +117,7 @@ public boolean shouldAlarm(String topic, String consumer, long lag) {
return false;
}
}
return lag > getWarnLag(topic, consumer);
return lag > getWarnLag(topic, consumer) || lag < 0;
}
}

Expand Down
4 changes: 2 additions & 2 deletions src/main/java/com/sf/monitor/DCMonitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@ protected SpringApplicationBuilder configure(SpringApplicationBuilder applicatio
return application.sources(DCMonitor.class);
}

public static void main(String[] args) {
public static void main(String[] args) throws Exception{
preare();
SpringApplication.run(DCMonitor.class);
}

private static void preare() {
private static void preare() throws Exception{
Config.init("config");
Resources.init();

Expand Down
Loading

0 comments on commit fa0bcb6

Please sign in to comment.