Skip to content
This repository has been archived by the owner on Dec 4, 2018. It is now read-only.

Simple Filter (remove) By Key Example? #124

Open
petergdoyle opened this issue Apr 3, 2017 · 3 comments
Open

Simple Filter (remove) By Key Example? #124

petergdoyle opened this issue Apr 3, 2017 · 3 comments

Comments

@petergdoyle
Copy link

petergdoyle commented Apr 3, 2017

Hi
Can you suggest the most simple, memory efficient way to remove elements with a specific key value? There is reference to this with the JSONStream.parse(pattern, map) form of the parse method. Given your example doc:

{"total_rows":129,"offset":0,"rows":[
  { "id":"change1_0.6995461115147918"
  , "key":"change1_0.6995461115147918"
  , "value":{"rev":"1-e240bae28c7bb3667f02760f6398d508"}
  , "doc":{
      "_id":  "change1_0.6995461115147918"
    , "_rev": "1-e240bae28c7bb3667f02760f6398d508","hello":1}
  },
  { "id":"change2_0.6995461115147918"
  , "key":"change2_0.6995461115147918"
  , "value":{"rev":"1-13677d36b98c0c075145bb8975105153"}
  , "doc":{
      "_id":"change2_0.6995461115147918"
    , "_rev":"1-13677d36b98c0c075145bb8975105153"
    , "hello":2
    }
  }
]}

if I wanted to remove all the "_id" and "_rev" elements anywhere they are encountered in the document (recursively) defined within another filter object

{
"filter": [
    "_id", 
    "_rev"
  ]
}

What is the "best" (most efficient) way to do this by a JSONPath? or by applying a map function with JSONStream.parse(pattern, map) and using a general filter "star.star.star' (that you have mentioned before for processing large documents and not loading the entire object into memory)? Also since this is stream processor, is the order of elements in the input document preserved?

@dominictarr
Copy link
Owner

I think the answer would to be set those keys to undefined or null don't uses delete because it's really slow (for some weird reason). there isn't a JSONStream feature to do this, but you could mutate the objects in the map function

@petergdoyle
Copy link
Author

petergdoyle commented Apr 4, 2017

Thanks for the quick reply.

So I guess first things first. I am having trouble with the parse function. In my case I want every element in the input stream so I want a "any" jsonpath selection but I also want to recurse to whatever depth the elements go, and I want to emit keys since I am trying to rewrite the JSON input directly to output but with a few of the keyed values removed (filtered out) AND I want to preserve the order of elements in the input stream. It sounded so simple.

So if I use a parse pattern like '$*' I can get all keys and values, but it doesn't recurse...


    source.pipe(JSONStream.parse('$*'))
    .pipe(es.mapSync(function (data) {
      console.log(data);
    }));

==>
{ value: 129, key: 'total_rows' }
{ value: 0, key: 'offset' }
{ value: 
   [ { id: 'change1_0.6995461115147918',
       key: 'change1_0.6995461115147918',
       value: [Object],
       doc: [Object] },
     { id: 'change2_0.6995461115147918',
       key: 'change2_0.6995461115147918',
       value: [Object],
       doc: [Object] } ],
  key: 'rows' }

According to your documentation, it seems you can only recurse using a {recurse: true} but that seems to require the array form of the "match" value

this selects nothing:
source.pipe(JSONStream.parse('$..*'))
this selects nothing:
source.pipe(JSONStream.parse(['$*',{recurse: true}]))
this selects nothing:
source.pipe(JSONStream.parse(['*',{recurse: true}, {returnKeys: true}]))
this throws an error:
source.pipe(JSONStream.parse('*..*'))

So I give up, what is the parse value that will select all nodes, will return all keys and values starting at and including the top level, and will recurse into every value?

Thanks!

@mattmackay76
Copy link

mattmackay76 commented Mar 8, 2018

I have the same issues of what to pass into parse.. I tried many of the same things you mentioned above like $..* and such. What I want to do is OR together things. I also tried regex to no avail.

Example json:
{
"Name": "abcName",
"LargeCollection1" : [ ...you get the idea..... ],
"LargeCollection2" : [ ...you get the idea, this is large hence needing to stream 15+ megs in here..... ],
"Address": "123 some st",
"City": "New York",
"State":"NY"
}

So, I want something like JSONStream.parse("Name|Address|City") to fire on('data') so I can capture this while ignoring megs of data I don't care about. I should say it would be like filtering through the entire stream but should produce a new JSON object rather than just catching "Name" and returning "abcName".

Is it possible to get back something like:

{
"Name": "abcName",
"Address": "123 some st",
"City": "New York",
"State":"NY"
}
So basically only the root level properties I care about, ignore everything else, output as JSON.

Any advice on this would be awesome!..

-Matt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants