Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Offset Based Pagination in Search #187

Open
vroommm opened this issue Apr 13, 2020 · 18 comments
Open

No Offset Based Pagination in Search #187

vroommm opened this issue Apr 13, 2020 · 18 comments

Comments

@vroommm
Copy link

vroommm commented Apr 13, 2020

Description of the Issue

There doesn't seem to be a way to use Offset based pagination with the box search command
https://developer.box.com/guides/api-calls/pagination/offset-based/

Instead there is a hard limit on the size of the search Result defined as 100. This is smaller than the max results size in the REST API which is 200 results.
https://developer.box.com/reference/get-search/

The limitation is significant because search is the only method I have found to recurse over a folder structure and get all items matching a pattern in one step for example... find me all the pdfs in my folders under a given start folder..

  • it would be better if the search method applied the max and limits something like, .e.g
const RESULTS_LIMIT = 100;
cons MAX_RESULTS_LIMIT=200;
   ... and then later...
class SearchCommand extends BoxCommand {	
async run() {
		const { flags, args } = this.parse(SearchCommand);
		let options = {}
                if (flags.limit) {
                       options.limit= (MAX_RESULTS_LIMIT< flags.limit ? MAX_RESULTS_LIMIT: flags.limit);
                } else { 
                       options.limit= RESULTS_LIMIT; 
                }
		if (flags.offset) {
			options.offset= flags.offset;
		}
	

(deep apologies if the code is pants I'm no expert!!!)

  • there is no method to pass an offset to get a specific set of results.

  • there is no way to get total_count the absolute result set size as per the docs

Maybe the best option here is to wrap all of this into the search options interface so we can have

box search abc would fetch the first 100 records as it does now...
box search:total_count --limit=200 would fetch back the total number of pages if the limit is 200 per page
box search abc* --limit=200 --offset=2 fetches 200 entries from the 3rd page of the collection

Versions Used

Box CLI: @box/cli/2.4.0 win32-x64 node-v12.6.0
Operating System: Windows 10

Steps to Reproduce

  1. Run a search on an area with more than 100 items in the folder hierarchy as reported by the box API.
  2. You can only ever get the top 100 results..

Error Message, Including Stack Trace

N/A

@ianhorn
Copy link

ianhorn commented Apr 13, 2020

I ran into the same issue where I needed more than 100 results. I modified the search.js file max results to 2500 or 5000.The search takes a little longer, but I get all the results I need.

Location on a PC

C:\Program Files\@boxcli\client\src\commands\users\search.js

'use strict';

const BoxCommand = require('../box-command');
const { flags } = require('@oclif/command');
const _ = require('lodash');
const BoxCLIError = require('../cli-error');

const RESULTS_LIMIT = 5000;

@sujaygarlanka
Copy link
Contributor

Thank you @ianhorn for the answer. @vroommm We will create a ticket on our backlog to allow a user to specify the RESULTS_LIMIT.

@vroommm
Copy link
Author

vroommm commented Apr 13, 2020

Thanks @ianhorn , I think I can work with that just enough to get me out of a hole. :-)

@sujaygarlanka Thank you.

Just so I understand, do I take it the code is essentially interpreted at run time?

@ianhorn
Copy link

ianhorn commented Apr 13, 2020

@vroommm I'm not sure I'm following. You will need to drill down to the search.js in the @boxcli directory in your computer's applications and edit that file. As seen from above, it will alter the default limit of 100 to whatever limit you need.

If I understand your question correctly, once you have edited the search.js file, you can run your query without changing any of your options for limit because you've reconfigured the BoxCLI program.

Example

box search FILENAME -s 

@vroommm
Copy link
Author

vroommm commented Apr 13, 2020

@ianhorn The file I was looking at is under ...\@boxcli\client\src\commands\search.js not . @boxcli\client\src\commands\users\search.js.

I've made the change you suggested and it WORKS!!! (but you knew that!)

box search 97* --file-extensions=pdf --json -y --save-to-file-path=allfiles.json --fields=name,parent,shared_link -v

  box-cli:output Filtering output with fields: [ 'type', 'id', 'name', 'parent', 'shared_link' ] +1ms
  box-cli:output Filtering output with fields: [ 'type', 'id', 'name', 'parent', 'shared_link' ] +2ms
  box-cli:output Formatted 5000 output entries for display +18ms
  box-cli:output Using json output format +2ms
  box-cli:output Processed output as JSON +20ms
  box-cli:output File already exists at d:\BoxSync\allfiles.json +4ms
  box-cli:output Writing output to specified location on disk: d:\BoxSync\allfiles.json +1ms
Output written to d:\BoxSync\allfiles.json
  box-cli:output Finished writing output +11ms

So although I can't get back the marbles I lost battling with this, I can at least keep the few I have remaining for a little longer...

Thanks...

FYI: The above results also answer my secondary question... Although I can read java, and have dabbled, I've never knowingly used Node. An understanding of the architecture, however superficial is always nice...

@ianhorn
Copy link

ianhorn commented Apr 13, 2020

@vroommm I'm glad you found the correct search.js file. Sorry if I provided the wrong one. I wouldn't worry too much about the javascript (node). I don't really understand it either, but enough to break things.

Out of curiosity, once you changed the LIMIT RESULTS = XXX, did you still trying using a limit in your query. I've been wanting to try that because on my search I get back 5000 results and only really need 250.

@vroommm
Copy link
Author

vroommm commented Apr 13, 2020

@ianhorn Haven't tried anything fancy with limit stuff yet. I will if/when I get a moment. I'm resisting the urge for one more compile and moving on...

@vroommm
Copy link
Author

vroommm commented Apr 13, 2020

@ianhorn The one thing I was trying to do, with moderate success, was to batch the files by date range... I was using a bulk import input file like this...

ancestor_folder_ids,query,type,created-at-from,created-at-to
44871803643,97*,file,-721d,-720d
44871803643,97*,file,-722d,-721d

that kind of worked... but only if there were less than 100 files on the day in question... not something I could guarantee.

@vroommm
Copy link
Author

vroommm commented Apr 15, 2020

@ianhorn @sujaygarlanka
I managed to get the limit to work as a simple command parameter so you can use it like this to get the first 250 results say:

  • "box search [QUERY TERM] --limit=250" --> fetches the first 250 results
  • "box search [QUERY TERM]" --> fetches the default 100 results
  • "box search [QUERY TERM] --limit=1" --> fetches the 1st record.

The last one can be really useful to combine with sort so you can use it to get the latest, oldest, first, last based on the sort direction and sort term...

As it turns out the code is pretty simple (even I managed it):

const RESULTS_LIMIT = 100;
//... omit a bit...
class SearchCommand extends BoxCommand {
	async run() {
		const { flags, args } = this.parse(SearchCommand);
	
        //Set the limit to the default unless we passed a value 
		let options = {};
		if (flags.limit){
			options.limit = flags.limit;
		} else {
			options.limit = RESULTS_LIMIT;
		}
		
//... omit a bit more
		
		// Limit the search results to avoid slamming the API
		let limitedResults = [];
		for await (let result of { [Symbol.asyncIterator]: () => results }) {
			let numResults = limitedResults.push(result);
			//edit by vroommm to use the current options value
			if (numResults >= options.limit) {
				break;
			}
		}
		await this.output(limitedResults);
			
//... gosh how much are we leaving out

SearchCommand.description = 'Search for files and folders in your Enterprise';
SearchCommand.examples = [
	'box search "Q3 OKR"',
	'box search --mdfilter "enterprise.employeeRecord.name=John Doe"',
	'box search *.pdf --limit=250' 
];
SearchCommand._endpoint = 'get_search';

SearchCommand.flags = {
	...BoxCommand.flags,
	limit: flags.integer({
		description: 'The max number of records to return in the result set DEFAULT: 100'
	}),
//... phew we're done leaving stuff out

@vroommm
Copy link
Author

vroommm commented Apr 15, 2020

@sujaygarlanka @ianhorn
Thirty seconds after I posted the above I fully answered my own questions with

//... stuff before
		if (flags.offset){
			options.offset = flags.offset;
		} 
//... really you're omitting some more
	offset: flags.integer({
		description: 'The 0 based page to return,  default=0 or omitted for first page,  1= second etc..'
	}),
	limit: flags.integer({
		description: 'The max number of records to return in the result set DEFAULT: 100'
	}),

Which means you can get the nth value for any search by using...

"box search [QUERY TERM] --limit=1 --offset=0" --> fetches the 1st record.
"box search [QUERY TERM] --limit=1 --offset=[n-1]" --> fetches the nth record.
eg.
"box search [QUERY TERM] --limit=1 --offset=9" --> fetches the 10th record.

and of course you can use it to get pages of multiple records as:

"box search [QUERY TERM] --offset=0" --> fetches the 1st 0-100 records.
"box search [QUERY TERM] --offset=900" --> fetches records 900-999
"box search [QUERY TERM] --limit=1000 --offset=9000" --> fetches records 9,000-10,000 (if they exist)

Finally this approach should work for all the offset paginated box CLI methods...
:-)

@vroommm
Copy link
Author

vroommm commented Apr 15, 2020

@sujaygarlanka @ianhorn

minor note: when --offset is non-zero --limit must be an integer divider so the following are allowed:

  • --limit=1 --offset=99 the 100th record
  • --limit=10 --offset=0 10 records 0-9
  • --limit=10 --offset=90 10 records 90-99
  • --limit=3 --offset=9 3 records 9-11

but these will cause errors:

  • --limit=10 --offset=9
  • --limit=3 --offset=8

@ianhorn
Copy link

ianhorn commented Apr 15, 2020

Good job @vroommm. That's impressive. I really struggle with node javascript but have started to have some success.

@sujaygarlanka
Copy link
Contributor

@vroommm and @ianhorn Glad you guys were able to figure this out. Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?

@sujaygarlanka
Copy link
Contributor

sujaygarlanka commented Apr 15, 2020

@vroommm Also, regarding the issues with the bulk command, it may be limited to 100 because you set RESULT_LIMIT to a 100.

@vroommm
Copy link
Author

vroommm commented Apr 15, 2020

@sujaygarlanka

Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?

I'd agree with that. Only possible extra would be a form of validation, e.g.

if (offset!=0 && (offset % limit != 0)) { ...then its not a valid combo  so throw an error}

The default RESULT_LIMIT of 100 is a safe bet to leave as is, because as written:

  • --limit=nnn will always override the default to get more results if you need them
    and,
  • --offset=nnn will qualify where to start returning data if it's really too big a set to return in one go

Happy Days :-)

@vroommm
Copy link
Author

vroommm commented Apr 15, 2020

Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?

@sujaygarlanka If it were possible to have a search:total-count that would be useful...

You would be able to get the number of results to expect without actually pulling them over the network...

so if

  • search:total-count xyx returns {total_count:0} then don't bother
    and if
  • search:total-count xyx returns {total_count: 999999999} then you'd better think about your criteria

@sujaygarlanka
Copy link
Contributor

SDK-1379

@stale
Copy link

stale bot commented Aug 2, 2022

This issue has been automatically marked as stale because it has not been updated in the last 30 days. It will be closed if no further activity occurs within the next 7 days. Feel free to reach out or mention Box SDK team member for further help and resources if they are needed.

@stale stale bot added the stale label Aug 2, 2022
@mgrytsai mgrytsai added enhancement and removed stale labels Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants