Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how handle use of files in lab50? #8

Open
dmalan opened this issue Aug 13, 2018 · 19 comments
Open

how handle use of files in lab50? #8

dmalan opened this issue Aug 13, 2018 · 19 comments
Assignees
Labels
question Further information is requested

Comments

@dmalan
Copy link
Member

dmalan commented Aug 13, 2018

So right now, lab50 supports a file key, which is either a string or a list of strings, which is based on CS50 Sandbox's support for a file HTTP parameter, https://cs50.readthedocs.io/sandbox/#configuration. In the context of HTTP, didn't seem appropriate to call it files, since multiple files have to be provided one at a time, a la:

?file=foo.c&file=bar.c

But in the context of .cs50.yaml, it seems a bit confusing to have lab50 use file and have check50 and submit50 use files. I could easily change over to files, but then I'd run afoul of the tag requirement, which shouldn't apply to lab50's use of files:

https://cs50.readthedocs.io/sandbox/#configuration

How best to handle?

@dmalan dmalan added the question Further information is requested label Aug 13, 2018
@Jelleas
Copy link
Contributor

Jelleas commented Aug 13, 2018

The files key does make sense for lib50....

When we wrote this we envisioned files to be something used across tools (submit50 and check50) for which we wanted the exact same behaviour. Now that there's a third (lab50) that differs we should probably move validation for the files key to somewhere else in lib50.

You probably do want all the work we do for the files key (validating the key, figuring out wrong tags, creating the appropriate objects during parsing), but that to be customizable.

Something like this perhaps?:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

This would let lib50 do the ugly work, and validate all tags for you. And we could just export a IncludeExcludeRequireLoader (better name wanted) for submit50 and check50.

@dmalan
Copy link
Member Author

dmalan commented Aug 13, 2018

Relying on lib50 for tag parsing for lab50 too would be great, since I've been meaning to introduce syntax like

lab50:
  files:
    - !opened foo.c
    - !closed bar.c

so as to signal which tabs to open. So this would be handy. Could we perhaps avoid the need for a custom class per tag? Simpler would be a single wrapper class (e.g., Value) that defines the tags as attributes and has a __str__ method so that you can just write code like:

for file in files:
    print(f"Processing {file}")
    if file.opened:
        # do this
    else:
        # do that

Feels like a lot less overhead than defining one class per tag?

@cmlsharp
Copy link
Contributor

Sure, that's what we do with !include/!exclude/!require. There's a single FilePattern class that has a type field with three variants. So it ends up looking like:

for file in files:
    if file.type is PatternType.Required:
        # do this
    elif file.type is PatternType.Include:
        # do that

@dmalan
Copy link
Member Author

dmalan commented Aug 13, 2018

Should we perhaps simplify, though, so that we don't need to pre-define types, instead just setting properties to True for any present tags? Having to define PatternType.Required etc. feels like unnecessary work?

@cmlsharp
Copy link
Contributor

cmlsharp commented Aug 13, 2018

We could, it just (a) makes it tougher to validate that the user didn't specify invalid tags, which we would want to error on and (b) the syntax wouldn't be quite as nice as your proposal (without resorting to some __getattr__ tricks) anyway since we can't define an attribute for all possible tags. Thus it would end up being something like if hasattr(file, "opened") or something like that. We could override __getattr__ to return None if the attribute isn't found, but that seems pretty hackish.

Also, as a general rule I think it is generally good practice to not allow an API to represent invalid states. It is possible for file.included and file.excluded to both be true even though that doesn't make any sense whereas in the current model, a file only has one type.

@dmalan
Copy link
Member Author

dmalan commented Aug 16, 2018

Hm, having to register valid tags seems reasonable, in the spirit of:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

But:

  1. Could/should we associate tags with specific keys somehow, since !required and such don't make sense outside the context of files?
  2. Do we really need to wrap the value with a custom class (e.g., Foo or Bar)? What functionality are those classes meant to provide, besides a constant for == checking? That feels like slightly annoying overhead for a caller to define?

@Jelleas
Copy link
Contributor

Jelleas commented Aug 16, 2018

  1. We're hooking into the PyYAML parser, and I don't see a direct way to add support for associating tags with keys. It seems YAML tags are supposed to be types with a global or document scope (https://camel.readthedocs.io/en/latest/yamlref.html#tags).

  2. We don't need custom classes for this. As an interface I propose .register(tag, callable(tag)) to take a tag and a callable that takes in a tag and returns something (we need to return the parsed YAML after all). What happens in that callable is up to the author. It could just be setting an instance variable, or adding something to a dict.

@dmalan
Copy link
Member Author

dmalan commented Aug 16, 2018

  1. Sounds good, not a big deal.
  2. What might that look like specifically? Still feels like more work than should be needed just to check "is this string tagged with foo?", no? Might we just want to wrap all str values in the YAML with our own wrapper, a la the below?
class String:
  def __init__(self, value, tag = None):
      self.value = string
      self.tag = tag
      def __getattr__(self, name):
          return name == self.tag
      def __str__(self):
          return self.value

@Jelleas
Copy link
Contributor

Jelleas commented Aug 16, 2018

Oh! We could do something like that!

We can register a so-called multi-constructor (matching a prefix of a tag). That would just wrap every tagged string with a custom String (better name pending) class. Then the interface could just be:

config = lib50.Loader("require", "include", "exclude").load("<YAML_CONTENT>", "<TOOL>")

We don't even need to hack in __getattr__ if we require the author to specify the tags beforehand. We can simply set those tags to False by default.

@cmlsharp
Copy link
Contributor

The __getattr__ trick does feel pretty hackish to me. Why not just do if string.tag == "foo" instead of if string.foo? It's such a small syntactic difference and makes it more clear to anyone reading it. It should be noted though that not all strings will be of this type, only strings that are tagged (which I think is a feature, but does seem slightly different than David's suggestion).

@dmalan
Copy link
Member Author

dmalan commented Aug 16, 2018

not all strings will be of this type

Would it not be safer to wrap all str values, since tags themselves, per Jelle's note, can appear globally in a YAML file? And it'd definitely be annoying to have to adopt a convention of

if hasattr(s, "tag")

before accessing s.tag, no?

@Jelleas
Copy link
Contributor

Jelleas commented Aug 16, 2018

You're probably not checking str.tag at every part of the config though, only where you expect values to be tagged (like in the key files)? Seems weird to then want to wrap everything. So I agree with Chad, this is a feature!

That said we don't need __getattr__ in either case, because for the sake of lib50 validating the config, we need to know all tags beforehand anyway. So we know what tags can possibly exist, and can thus properly instantiate them on the object.

Coming back to my point earlier, since lib50.load() effectively serves as higher level (.cs50.yaml) specific parser. We could support local tags by parsing the whole .yaml file once, then parsing the parts where we expect tags individually again. This would allow us to raise more specific errors, and avoid tag conflicts.

@dmalan
Copy link
Member Author

dmalan commented Aug 16, 2018

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:

lab50:
  files:
    - foo.c
    - foo.h

And it seems silly to require instead

lab50:
  files:
    - !opened foo.c
    - !opened foo.h

for the common case. So I'd envision a common scenario of, e.g.:

lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html

or, equivalently:

lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html

How best to accommodate optionality, then, without forcing use of hasattr or such?

@Jelleas
Copy link
Contributor

Jelleas commented Aug 16, 2018

I'd envision a flag in the loader (for that scope if we support local tag scope) something like default=True, that wraps everything in a tagless String by default. And causes lib50 to not throw an error in case of a missing tag (we want to throw an error in check/submit50).

So for lab50 it could be:

lib50.Loader("closed", default=True).load(...)

Or:

loader = lib50.Loader()
loader.scope("files", "closed", default=True)
loader.load(...)

Edit:
Could also support a default tag, default="opened"

@dmalan
Copy link
Member Author

dmalan commented Aug 16, 2018

Yeah, I think values can only have one tag, right? If so, setting a default tag is probably better than a default value, since (accidental) code like

lib50.Loader("closed", default=True).load(...)
lib50.Loader("opened", default=True).load(...)

would seem to create an ambiguity otherwise?

@Jelleas
Copy link
Contributor

Jelleas commented Aug 17, 2018

Implemented a working concept of Loader in lib50.config

class Loader:

Unittests here:

class TestLoader(unittest.TestCase):

Interface is as follows:

loader = lib50.Loader("<TOOL>", "<GLOBAL_TAG_1>", "<GLOBAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.scope("<KEY>", "<LOCAL_TAG_1>", "<LOCAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.load(content)

For check50 we would do:

loader = lib50.Loader("check50")
loader.scope("files', "include", "exclude", "require")
loader.load(content)

For lab50:

loader = lib50.Loader("lab50")
loader.scope("files", "closed", default="opened") 
# or if !opened needs to be a valid tag
# loader.scope("files", "closed", "opened", default="opened") 
loader.load(content)

Currently you can only scope keys exactly one level deep (top-level is reserved for tools). That means you can scope keys like files, dependencies etc. Reason I did this was to avoid any complications with nesting scopes, or keys occurring multiple times in the YAML file.

As a result from .load() you get the same result as before, however every tagged value (or where a default was applied) is now an instance of TaggedValue. That has a tag, tags and value attribute, and a boolean attribute for every tag so that you can safely access tagged_value.<any_valid_tag> without having to check hasattr first.

I really like scopes here even in the case of check50, as we can now raise errors from lib50 if you tag a value where we're not expecting tags.

ps. I have not integrated any of this with lib50.files() yet, waiting for feedback :)

@Jelleas
Copy link
Contributor

Jelleas commented Jun 14, 2019

Here is how we currently envision lab50 can use lib50.files.

lib50/tests/lib50_tests.py

Lines 433 to 471 in 7356e65

def test_lab50_tags(self):
# Four dummy files
open("foo.py", "w").close()
open("bar.py", "w").close()
open("baz.py", "w").close()
open("qux.py", "w").close()
# Dummy config file (.cs50.yml)
content = \
"lab50:\n" \
" files:\n" \
" - !open \"foo.py\"\n" \
" - !include \"bar.py\"\n" \
" - !exclude \"baz.py\"\n" \
" - \"qux.py\"\n"
# Create a config Loader for a tool called lab50
loader = lib50.config.Loader("lab50")
# Scope the files section of lab50 with the tags: open, include and exclude
loader.scope("files", "open", "include", "exclude", default="include")
# Load the config
config = loader.load(content)
# Figure out which files have an open tag
opened_files = [tagged_value.value for tagged_value in config["files"] if tagged_value.tag == "open"]
# Have lib50.files figure out which files should be included and excluded
# Simultaneously ensure all open files exist
included, excluded = lib50.files(config["files"], require_tags=["open"])
# Make sure that files tagged with open are also included
opened_files = [file for file in opened_files if file in included]
# Assert
self.assertEqual(included, {"foo.py", "bar.py", "qux.py"})
self.assertEqual(excluded, {"baz.py"})
self.assertEqual(set(opened_files), {"foo.py"})

@dmalan
Copy link
Member Author

dmalan commented Jun 14, 2019

Few questions:

  1. What's the difference between !open and !include?
  2. What does a filename without a tag imply?

@Jelleas
Copy link
Contributor

Jelleas commented Jun 14, 2019

  1. !include would be, just include whatever matches the pattern, but don't open them. Whereas !open includes this specific file and opens it. This behavior is completely customizable.

  2. I set a default tag to !include. If a tag is missing, it gets tagged with !include.

loader.scope("files", "open", "include", "exclude", default="include")

Implemented the default behavior for this comment from last year (#8 (comment)):

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:

lab50:
  files:
    - foo.c
    - foo.h

And it seems silly to require instead

lab50:
  files:
    - !opened foo.c
    - !opened foo.h

for the common case. So I'd envision a common scenario of, e.g.:

lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html

or, equivalently:

lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html

How best to accommodate optionality, then, without forcing use of hasattr or such?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants