Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get remaining arguments #12

Closed
cloudhead opened this issue Jan 25, 2022 · 22 comments
Closed

How to get remaining arguments #12

cloudhead opened this issue Jan 25, 2022 · 22 comments

Comments

@cloudhead
Copy link

To implement external sub-commands, eg. git foo --bar --baz, I need to invoke a binary eg. git-foo in this case with arguments --bar --baz.

I can't figure out though, how to get the remaining arguments as plain strings, and not as Long/Short/Value. Any ideas how I might achieve that?

@cloudhead
Copy link
Author

I found a solution though I don't think it's very nice:

            Value(val) if command.is_none() => {
                let cmd = val.to_string_lossy().into_owned();
                let mut args = vec![cmd];

                while let Some(a) = parser.next()? {
                    match a {
                        Long(s) => args.push(format!("--{}", s)),
                        Short(c) => args.push(format!("-{}", c)),
                        Value(v) => args.push(v.to_string_lossy().into_owned()),
                    }
                }
                command = Some(Command::External(args))
            }

@blyxxyz
Copy link
Owner

blyxxyz commented Jan 25, 2022

Assuming the last argument was a Value, you can do this by repeatedly calling parser.value(). (You can wrap this up in an iterator with std::iter::from_fn(|| parser.value().ok()).) It was not originally intended, but it works out exactly right.

It's mentioned in a single sentence in the docs but I would like to have a proper API for this.

@cloudhead
Copy link
Author

Yeah, this works perfectly, and I had missed it from the docs. I would suggest creating a function for it to make it easier to find, eg. parser.rest() or parser.collect(). You could even make parser an Iterator, then collect() would work as expected.

@blyxxyz
Copy link
Owner

blyxxyz commented Jan 26, 2022

My current plan is a raw_args() method that borrows self and returns a peekable iterator (or an error if part of an argument is left over). That way you can take as many arguments as you want and still continue parsing afterwards if you don't take them all.

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 5, 2022

I've implemented raw_args() in e0026c3. How does it look?

I'll make a new release in a few days.

@cloudhead
Copy link
Author

Looks promising!

One related issue is when delegating arguments to a sub-parser, I wonder if you have an ideas:

        while let Some(arg) = parser.next()? {
            match arg {
                Long("foo") => {
                    foo = true;
                }
                _ => {
                    // Here we basically want to delegate the rest of the arguments to be parsed to another parser.
                    // This doesn't quite work because we've already consumed `arg`!
                    let remaining = parser.raw_args();
                    let result = sub_parser.parse_args(remaining)?;
                    sub_options = result;
                    break;
                }
            }
        }

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 7, 2022

If arg is a Value then this should work:

Value(arg) => {
    let remaining = std::iter::once(arg).chain(parser.raw_args()?);
    ...

It gets trickier if it doesn't have to be a Value. One problem is that if the raw argument is --option=123, arg will be Long("option") and you're not allowed to call raw_args until you consume 123 by calling .value().

raw_args might not be the right approach for a subparser. If the subparser also uses lexopt then it'd make more sense to pass parser itself. But you'd also want to pass arg to have the subparser handle it first, and you can't pass parser until arg is gone (or has had its lifetime readjusted using an ugly trick).

I'm going to think about this. I have a few halfbaked ideas.

Do you actually need the _ pattern in this case or would Value(arg) (with another catch-all branch that returns an error) do?

@cloudhead
Copy link
Author

Yes this is exactly the stuff I've been running into.

I don't think I can match on Value because the sub-parser might need to handle flags as well.

The workaround I've been using so far is this:

let mut unparsed: Vec<OsString> = Vec::new();

while let Some(arg) = parser.next()? {
  match arg {
  ...
  _ => unparsed.push(args::format(arg))
  }
}

let sub = lexopt::Parser::from_args(unparsed);
...

And the implementation of args::format is:

pub fn format(arg: lexopt::Arg) -> OsString {
    match arg {
        lexopt::Arg::Long(flag) => format!("--{}", flag).into(),
        lexopt::Arg::Short(flag) => format!("-{}", flag).into(),
        lexopt::Arg::Value(val) => val,
    }
}

It would already be helpful to have a fmt::Display instance of Arg like the one above as part of the library.

@cloudhead
Copy link
Author

cloudhead commented Feb 7, 2022

One option could be to have a way to put back an arg into the parser, eg.

arg => {
  parser.input(arg); // "un-parse" arg
  sub_parser.parse(parser.raw_args());
}

Another interesting function might be Parser::into_raw_args which would consume the parser, returning the remaining args.

EDIT:

Just saw this comment:

It gets trickier if it doesn't have to be a Value. One problem is that if the raw argument is --option=123, arg will be Long("option") and you're not allowed to call raw_args until you consume 123 by calling .value().

This is something I'm not handling indeed, so it would break at the moment, though I don't use those types of options.

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 7, 2022

Does the subparser use lexopt too?

Do I understand correctly that the arguments can be interleaved, so that e.g. for the command line a b --foo c d the subparser should see a b c d?


A solution I have been pondering is a .current() method that returns the last return value of .next(), or a .peek() method that returns the next one. That'd work best if Value held an &OsStr instead of an OsString (which also has other benefits). But there are edge cases I'm unsure about, particularly around errors.

That would be somewhat similar to your parser.input() idea.


Another solution could be to turn your logic inside-out. In the outermost parser, define a closure like this:

let mut foo = false;
let handle = |arg| -> bool {
    match arg {
        Long("foo") => foo = true,
        _ => return false,
    }
    true
};

Then pass handle to the subparser, and let it drive the parsing but give the closure the first say:

while let Some(arg) = parser.next()? {
    match arg {
        arg if handle(&arg) => (),
        ...

(The closure indicates that it "consumed" arg by returning true.)

I'm not sure if that's worthwhile as-is. It's kind of complicated and doesn't let you call parser.value() inside the closure. But there might be something to the basic idea.

@cloudhead
Copy link
Author

Does the subparser use lexopt too?

Yes

Do I understand correctly that the arguments can be interleaved, so that e.g. for the command line a b --foo c d the subparser should see a b c d?

That's right, at least I support that with my current workaround in some cases.

Will think about some of the ideas you proposed, thanks! I've gone through many options parsers over the last few years: pico-args, argh, structopt and clap. So for this one is the best, so kudos!

@cloudhead
Copy link
Author

cloudhead commented Feb 10, 2022

After some more exploration, I've found that the method above with some tweaks is the cleanest and most flexible. Basically, I have a trait like this, implemented by all parsers:

pub trait Args: Sized {
    fn from_args(args: Vec<OsString>) -> anyhow::Result<(Self, Vec<OsString>)>;
}

Then when parsing, I do something like this:

impl Args for MainOptions {
    fn from_args(args: Vec<OsString>) -> anyhow::Result<(Self, Vec<OsString>)> {
        let (sub_opts1, unparsed) = SubOptions1::from_args(args)?;
        let (sub_opts2, unparsed) = SubOptions2::from_args(unparsed)?;
        let (sub_opts3, unparsed) = SubOptions3::from_args(unparsed)?;
        // etc ...

        // Now the main option parser working with the remaining options.
        let mut parser = lexopt::Parser::from_args(unparsed);
        while let Some(arg) = parser.next()? { ... }
        // ...

The sub-parsers build an unparsed list from ignored args with _ => unparsed.push(args::format(arg)). It would of course be cleaner if there was a way to have a lexopt::Arg::to_owned or something to store these unparsed args, and for lexopt::Parser::from_args to work with a Vec<lexopt::Arg>.

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 10, 2022

Ah, that's a nice way to think about it!

I don't think I'll ever support that properly, because to parse arguments the correct/pedantic way you have to do a single pass with all the information available. An argument -ab could mean either -a -b or -a=b. If SubOptions1 handles -b and SubOptions2 handles -a then you get in a tangle because SubOptions2 has to decide whether -b occurs at all.

But I think you said none of your options take arguments, so you don't suffer from this problem.

I've had an OwnedArg implementation stashed for months but I'm not really happy with it because Arg is already half-borrowed, half-owned. And it doesn't need to be part of lexopt proper, it can be implemented outside the crate.

Would something like this work?

use std::ffi::OsString;

use lexopt::Arg;

enum OwnedArg {
    Value(OsString),
    Short(char),
    Long(String),
}

impl From<Arg<'_>> for OwnedArg {
    fn from(arg: Arg<'_>) -> OwnedArg {
        match arg {
            Arg::Value(value) => OwnedArg::Value(value),
            Arg::Short(opt) => OwnedArg::Short(opt),
            Arg::Long(opt) => OwnedArg::Long(opt.to_owned()),
        }
    }
}

fn main() -> anyhow::Result<()> {
    let mut args = Vec::new();
    let mut parser = lexopt::Parser::from_env();
    while let Some(arg) = parser.next()? {
        args.push(OwnedArg::from(arg));
    }
    let (parsed, _) = MainOptions::from_args(args)?;
    Ok(())
}

from_args would take a Vec<Arg> instead of a Vec<OsString>, and you would only ever use a Parser once. I think that if you're never calling .value(), just looping through a Vec is enough.

@cloudhead
Copy link
Author

Yes, I think that would work!

Parser::from_args<T: Into<OsString>>(args: Vec<T>) perhaps and have an impl for OwnedArg -> OsString.

@cloudhead
Copy link
Author

I'd probably also implement Arg: ToOwned instead of From/Into

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 10, 2022

What I was getting at is that you only need to use a Parser once. If you have an args: Vec<OwnedArg>, instead of doing this:

let mut parser = lexopt::Parser::from_args(args);
while let Some(arg) = parser.next()? {
    match arg {

Can't you just do this?

for arg in args {
    match arg {

If you're only doing options and positional arguments (and no options with values) then that seems like it would be enough.

A fundamental problem with processing multiple times is that you lose the meaning of --. -a -b -- -c becomes Short('a'), Short('b'), Value("-c"), but if you naively transform that back you get -a -b -c, which is then parsed as Short('a'), Short('b'), Short('c').

Running the arguments through a Parser exactly once avoids that problem, and possibly others as well.

@cloudhead
Copy link
Author

cloudhead commented Feb 10, 2022

Can't you just do this?

Ah I see, but how about optional values and stuff? Eg. --foo 123 wouldn't work with the above.

A fundamental problem with processing multiple times is that you lose the meaning of --

Interesting.. This could be solved if -- returned Arg::EndOfFlags or something, at which point you could break out of the loop. But that would break other things. Somehow though you'd want to parse up to -- and leave everything after as positional values for the main parser..

EDIT: Actually doesn't getting a Value(_) signify the end of the flags? At which point the sub-parser could break out, but I guess you'll lose the -- distinction, ie. -a -b -- -c will turn into -a -b - c after the first parser, unless you do:

Value(a) => {
  unparsed.push("--");
  unparsed.push(a);
  unparsed.extend(parser.raw_args());
  break;
}

Or simply split the unparsed flags from the unparsed values..

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 10, 2022

Ah I see, but how about optional values and stuff? Eg. --foo 123 wouldn't work with the above.

Yep, that's a limitation. I thought you didn't use options like those, but I guess you only meant you didn't spell them --foo=123?

EDIT: Actually doesn't getting a Value(_) signify the end of the flags?

That's how POSIX does it, but most modern parsers (including lexopt) don't do it like that. You can put flags after positional arguments if you want. If you do want the POSIX behavior I wrote up an example here.

But I don't think that works for your use case, because the 123 argument to --foo will look like a Value until you get to the subparser that consumes it.


If you want to be able to do --foo 123 then to get all the details right I think you really do need to do it in a single pass. So that would mean only a single while let Some(arg) = parser.next()? { loop, and somehow delegating the arguments to the subparsers inside that loop.

That, or ditching the subparsers and combining it all into one big match. Would that be possible/practical?

@cloudhead
Copy link
Author

Can you remind me what the issue is with --foo 123 with my approach? Is it not supporting -- to separate flags from values? I'll try a few things and see if I can break it.

The reason I have sub-parsers is simply that some flag sets are shared across tools, and I don't want to have to duplicate the matches and parsing. So it's a question of keeping things in one place.

@blyxxyz
Copy link
Owner

blyxxyz commented Feb 10, 2022

Can you remind me what the issue is with --foo 123 with my approach? Is it not supporting -- to separate flags from values? I'll try a few things and see if I can break it.

The main issue I meant is that the Vec<OwnedArg> approach doesn't work flawlessly. If it did work you wouldn't need to re-parse, and re-parsing indeed breaks --.

Spellings like --foo=123 and -ovalue also break. And if you write --foo -123 where -123 is supposed to be a negative number you run into trouble.

(You could decide that these shortcomings are acceptable.)

The reason I have sub-parsers is simply that some flag sets are shared across tools, and I don't want to have to duplicate the matches and parsing. So it's a question of keeping things in one place.

Ah, I understand now! I don't have a good solution yet but I'll keep thinking about it.

@cloudhead
Copy link
Author

Ok interesting, these are ok trade-offs for me at the moment, but obviously a more robust solution would be interesting.

@blyxxyz
Copy link
Owner

blyxxyz commented Jul 10, 2022

raw_args() is now released in version 0.2.1.

Feel free to open another issue for subparsers.

@blyxxyz blyxxyz closed this as completed Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants