1

So I have to come up with an approach to make a large amount of data "readable" for the user and was wondering if someone could point out the difference between an approach of using something elasticsearch + kibana versus using something like MRTG. What would be more suitable for data analysis that is focused more on trends?

2 Answers2

0

The two approaches you mention are for radically different types of data.

If your data consists of a series of regular timestamped metric values, such as 5-min samples of traffic rates from a router interface, or 1-min samples from a temperature sensor, then MRTG (or rather, RRDTool, which is the backend database) is excellent for doing this. If the data are irregular it is still possible, though you need to customise the RRDTool database settings somewhat to avoid large 'unknown' areas. RRDTool is capable of trending analysis for the metrics you are logging, though this is not done via MRTG -- you'd need to call the RRDTool functions directly.

If your data are an irregular sequence of text log entries (events), possibly with parseable positional data, and you're more interested in the number or rate of events before then drilling down to view individual events, Logstash/Kibana is the way to go. They will give you graphs of event rate over time, but I do not think they can provide trending analysis. Also, they do not provide graphing analysis of parsed data embedded within the event log text. Logstash/kibana are great for things like Syslog, Eventlog, application logs (like Apache logs) and so on, where you're more interested in seeing a pattern of how many events matching a certain pattern occurred over time.

You've not provided enough information about the actual nature of your data, nor of what sort of 'readable' analysis your users require, so this is necessarily a high-level summary of capabilities.

Steve Shipway
  • 3,754
  • 3
  • 22
  • 39
  • I can give you an example: I have this data from a type of equipment where I know how many checks and violations there are for a period of time(The period of time is of my choosing depending on how I aggregate the data and yes it's timestamped). I want to graph this and interpret the data so we can identify patterns to avoid big problems(too many violations, no limit checks). Thank you for your answer. – J. Castellanos Oct 23 '14 at 11:12
  • If you have some timestamped data, at generally regular intervals, with a count of violations since the last sample, then this would be perfect for MRTG/RRDTool using the 'ABSOLUTE' data type, which would convert it into a rate of violations/sec that you could then summarise and graph over time. If, however, you had generated one log entry per violation, then Elasticsearch might have been more appropriate. – Steve Shipway Oct 23 '14 at 20:45
0

Elasticsearch is effective for storing structured data, like text. Logstash's use case is an example of how to structure data for effective queries.

MRTG/RRD is a tool used to measure time interval data. Every X time units, log value Y. MRTG/RRD is not effective for storing text, it's job does not touch Elasticsearch's use case.

Graphite might be a tool to consider if you already have a Logstash installation up and running. Logstash can fire events to Graphite or Statsd as well as store your event data in ElasticSearch. The nice thing about Graphite/Carbon is that it's not a tied to the time interval as MRTG. You can just fire information into Graphite as much as you like, or as infrequently.

The use case you gave in the other answer would be an excellent use of Graphite or similar tool. You can graph and report on many value based events in Graphite, then use Elasticsearch to correlate data back to an event. (I don't mean that there is integration with ES and Graphite, just that if you use Logstash to push events, the times will be easy to look for.)

BDM
  • 328
  • 2
  • 11
  • I think I just realized you were asking about Kibana's graphing ability based on an event query, weren't you? – BDM Oct 29 '14 at 05:41