Issues #29 and #30 #31

evilezh · 2016-11-15T01:53:48Z

This is approximate idea for #29 and #30 .
#29 I tested - it works. #30 will test bit later. Need to roll out and change some stuff, before I can say it works correct.

It is kinda "first/second" code i wrote in golang, so welcome to modify and make it nicer.

Sorry, no tests, no documentation changes.

here is what i added to configuration:
key: "plugin_running_on"
output_type: "tree"

Both are optional, If not specified, it will default to previous behavior.

…nd map)

evilezh · 2016-12-28T13:21:32Z

I changed to new plugin api.
New defaults

output type = tree. My guess is, that anyone who uses kafka and have more that 5-10 servers would like to get 80-90% reduced in size messages. For me - if I do array, i produce 250+kb message every 10 seconds, in tree format, removing redundant attributes, it is ~25kb.
as partitioning key is used tag "plugin_running_on". There is no option to remove partitioning key at all (e.g. use random). I can put such, if someone think it is somehow usable.

p.s.
If someone could fix tests and build scripts/godeps, it would be awesome.

mbbroberg · 2017-01-03T23:32:25Z

Thanks for the update @evilezh. The maintainers of Snap are unfamiliar with Kafka, so this PR will take much longer to review than we like. I might use this as an excuse to dig into it. We'll keep you posted - happy new year!

katarzyna-z · 2017-01-31T15:13:27Z

I've launched the plugin with your changes and I see that with your changes some information is lost: unit, version, last_advertised_time, version.

Output from current kafka publisher:

[
  {
    "timestamp": "2017-01-30T11:02:29.933141255+01:00",
    "namespace": "/intel/psutil/load/load1",
    "data": 0.44,
    "unit": "Load/1M",
    "tags": {
      "plugin_running_on": "kujawka-Z97X-UD7-TH"
    },
    "version": 0,
    "last_advertised_time": "2017-01-30T11:02:29.93389848+01:00"
  },
  {
    "timestamp": "2017-01-30T11:02:29.933141825+01:00",
    "namespace": "/intel/psutil/load/load15",
    "data": 0.56,
    "unit": "Load/15M",
    "tags": {
      "plugin_running_on": "kujawka-Z97X-UD7-TH"
    },
    "version": 0,
    "last_advertised_time": "2017-01-30T11:02:29.933898807+01:00"
  }
]

Output from kafka publisher with your changes:

{
  "_tags": {
    "plugin_running_on": "kujawka-Z97X-UD7-TH"
  },
  "_timestamp": 1485772518024965428,
  "intel": {
    "psutil": {
      "load": {
        "load1": 0.15,
        "load15": 0.5
      }
    }
  }
}

evilezh · 2017-02-07T09:19:18Z

That is general idea of conversion. To drop un-used fields and get only actual data.
So - what happens ...
timestamp is only one per document and converted to nanoseconds.
Tags also only per document.
last_advertised_time, version, unit are dropped as unnecessary.

Format is optional, depends on your requirements. If you need full output, you can use old method. If you are sufficient with only path and data, then welcome to use tree-like output. That in general saves you around 90% of traffic and space on kafka.
It is big difference to send 300kb or 30kb documents every 10 seconds from each host.

mkleina · 2017-02-13T15:32:08Z

I think all data like unit and version should be kept for future filtering or debugging (to know which version of plugin was used in case the value is incorrect), so output should be like:

{
  "_tags": {
    "plugin_running_on": "kujawka-Z97X-UD7-TH"
  },
  "_timestamp": 1485772518024965428,
  "intel": {
    "psutil": {
      "load": {
        "load1": {
           "value": 0.15,
           "version": 0,
           "unit": "Load/1M",
           "last_advertised_time": "2017-01-30T11:02:29.93389848+01:00"
        },
        "load15": {
           "value": 0.5,
           "version": 0,
           "unit": "Load/15M",
           "last_advertised_time": "2017-01-30T11:02:29.933898807+01:00"
      }
    }
  }
}

Anyway, I don't think that changing format of data is good way to achieve lower bandwidth. Correct way to optimize data trasfer is compressing data. It seems that data compression is supported both by Kafka and sarama (Kafka's client library for Go). You can find simple compression example here:
https://github.com/Shopify/sarama/blob/9a9e66f928cbc9febba598484ab34dcef6f8f43e/examples/http_server/http_server.go#L219

I see that in snap-plugin-publisher kafka there is "nil" provided as config parameter for NewSyncProducer, so config Compression parameter is set to 0, which corresponds to "CompressionNone" (https://godoc.org/github.com/Shopify/sarama#CompressionCodec).

Maybe it's worth to play with compression parameter instead of refactoring whole output format?

evilezh · 2017-03-03T01:20:45Z

There is nothing about to argue ... who wants .. outputs in previous format, i added new output format which throws out all unnecessary data which in real life maybe 0.01% users need. This is not only about compressing, but also data representation for further processing.
Do you know what is last_advertised_time and for what it can be used ? do you need version ?
If you want extended format .. you still do [{},{},{}], if you want something simpler, i suggest to use my way.
My use case is systems based on logstash/elasticsearch who takes documents in input rather than single metric. And all extra field like version,unit,last_advertised_time in such case is garbage.

PatrykMatyjasek · 2017-03-08T15:16:21Z

@evilezh maybe it can be resolved like in elasticearch publisher. I've added configuration option which enables user to choose fields from metric to publish. Default is publishing whole metric structure, but you can choose for example only Namespace and Data. And only those parts are published in database.
Sample manifest:
https://github.com/intelsdi-x/snap-plugin-publisher-elasticsearch/blob/master/examples/tasks/logs-openstack-elasticsearch.json#L40
Maybe it should be configurable for other publisher plugins

evilezh · 2017-03-10T19:42:47Z

As for elasticsearch ...
If you use it .. then you will agree that this format is most effective. In elastic you will define mappings anyway, so - anything except value is garbage. You do not store single metric per document.
you do like:

{ "timestamp":12123123,
  "data1": 123,
  "data2": 321
}

not like:

{ "data1" : { 
    "value": 123,
    "timestamp":123123123
   },
  "data2": { 
    "value": 321,
    "timestamp":123123123
  }
}

I did some research on plugins i use, all other fields except value usually is filled with same repeatable value.

And about compression on wire. Yes it makes smaller amount of data to transfer, but it is huge amount to store .. it is huge amount to parse. And I will need post-process, as I did before and cut it all out. It makes much more sense - to cut all unnecessary data out before even sending to kafka. Otherwise it is server -> kafka -> post-processing -> kafka -> many readers.

And you can always add some other format for kafka output:
like csv, syslog etc :) I just added one - because i think, any elastic user would benefit much from it, as data can be post-processed very easy without expensive post-processing.

Haralds added 7 commits November 15, 2016 01:47

Issues intelsdi-x#29 and intelsdi-x#30

e838168

fixed formating

594a241

value setting fix

dd2d8ed

Fixed same slot collision (two values with same key. e.g. primitive a…

02057c6

…nd map)

formattin issues

1f45322

NaN fix - json fails to marshal NaN

484ca59

new plugin api, output type defaults to tree

61da43f

bug fix

2380463

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues #29 and #30 #31

Issues #29 and #30 #31

evilezh commented Nov 15, 2016 •

edited

Loading

evilezh commented Dec 28, 2016 •

edited

Loading

mbbroberg commented Jan 3, 2017

katarzyna-z commented Jan 31, 2017 •

edited

Loading

evilezh commented Feb 7, 2017

mkleina commented Feb 13, 2017 •

edited

Loading

evilezh commented Mar 3, 2017

PatrykMatyjasek commented Mar 8, 2017

evilezh commented Mar 10, 2017

Issues #29 and #30 #31

Are you sure you want to change the base?

Issues #29 and #30 #31

Conversation

evilezh commented Nov 15, 2016 • edited Loading

evilezh commented Dec 28, 2016 • edited Loading

mbbroberg commented Jan 3, 2017

katarzyna-z commented Jan 31, 2017 • edited Loading

evilezh commented Feb 7, 2017

mkleina commented Feb 13, 2017 • edited Loading

evilezh commented Mar 3, 2017

PatrykMatyjasek commented Mar 8, 2017

evilezh commented Mar 10, 2017

evilezh commented Nov 15, 2016 •

edited

Loading

evilezh commented Dec 28, 2016 •

edited

Loading

katarzyna-z commented Jan 31, 2017 •

edited

Loading

mkleina commented Feb 13, 2017 •

edited

Loading