how handle use of `files` in lab50? #8

dmalan · 2018-08-13T15:29:04Z

So right now, lab50 supports a file key, which is either a string or a list of strings, which is based on CS50 Sandbox's support for a file HTTP parameter, https://cs50.readthedocs.io/sandbox/#configuration. In the context of HTTP, didn't seem appropriate to call it files, since multiple files have to be provided one at a time, a la:

?file=foo.c&file=bar.c

But in the context of .cs50.yaml, it seems a bit confusing to have lab50 use file and have check50 and submit50 use files. I could easily change over to files, but then I'd run afoul of the tag requirement, which shouldn't apply to lab50's use of files:

https://cs50.readthedocs.io/sandbox/#configuration

How best to handle?

The text was updated successfully, but these errors were encountered:

Jelleas · 2018-08-13T15:57:12Z

The files key does make sense for lib50....

When we wrote this we envisioned files to be something used across tools (submit50 and check50) for which we wanted the exact same behaviour. Now that there's a third (lab50) that differs we should probably move validation for the files key to somewhere else in lib50.

You probably do want all the work we do for the files key (validating the key, figuring out wrong tags, creating the appropriate objects during parsing), but that to be customizable.

Something like this perhaps?:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

This would let lib50 do the ugly work, and validate all tags for you. And we could just export a IncludeExcludeRequireLoader (better name wanted) for submit50 and check50.

dmalan · 2018-08-13T16:46:06Z

Relying on lib50 for tag parsing for lab50 too would be great, since I've been meaning to introduce syntax like

lab50:
  files:
    - !opened foo.c
    - !closed bar.c

so as to signal which tabs to open. So this would be handy. Could we perhaps avoid the need for a custom class per tag? Simpler would be a single wrapper class (e.g., Value) that defines the tags as attributes and has a __str__ method so that you can just write code like:

for file in files:
    print(f"Processing {file}")
    if file.opened:
        # do this
    else:
        # do that

Feels like a lot less overhead than defining one class per tag?

cmlsharp · 2018-08-13T17:16:53Z

Sure, that's what we do with !include/!exclude/!require. There's a single FilePattern class that has a type field with three variants. So it ends up looking like:

for file in files:
    if file.type is PatternType.Required:
        # do this
    elif file.type is PatternType.Include:
        # do that

dmalan · 2018-08-13T18:36:40Z

Should we perhaps simplify, though, so that we don't need to pre-define types, instead just setting properties to True for any present tags? Having to define PatternType.Required etc. feels like unnecessary work?

cmlsharp · 2018-08-13T23:12:10Z

We could, it just (a) makes it tougher to validate that the user didn't specify invalid tags, which we would want to error on and (b) the syntax wouldn't be quite as nice as your proposal (without resorting to some __getattr__ tricks) anyway since we can't define an attribute for all possible tags. Thus it would end up being something like if hasattr(file, "opened") or something like that. We could override __getattr__ to return None if the attribute isn't found, but that seems pretty hackish.

Also, as a general rule I think it is generally good practice to not allow an API to represent invalid states. It is possible for file.included and file.excluded to both be true even though that doesn't make any sense whereas in the current model, a file only has one type.

dmalan · 2018-08-16T14:05:13Z

Hm, having to register valid tags seems reasonable, in the spirit of:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

But:

Could/should we associate tags with specific keys somehow, since !required and such don't make sense outside the context of files?
Do we really need to wrap the value with a custom class (e.g., Foo or Bar)? What functionality are those classes meant to provide, besides a constant for == checking? That feels like slightly annoying overhead for a caller to define?

Jelleas · 2018-08-16T15:03:29Z

We're hooking into the PyYAML parser, and I don't see a direct way to add support for associating tags with keys. It seems YAML tags are supposed to be types with a global or document scope (https://camel.readthedocs.io/en/latest/yamlref.html#tags).
We don't need custom classes for this. As an interface I propose .register(tag, callable(tag)) to take a tag and a callable that takes in a tag and returns something (we need to return the parsed YAML after all). What happens in that callable is up to the author. It could just be setting an instance variable, or adding something to a dict.

dmalan · 2018-08-16T15:15:46Z

Sounds good, not a big deal.
What might that look like specifically? Still feels like more work than should be needed just to check "is this string tagged with foo?", no? Might we just want to wrap all str values in the YAML with our own wrapper, a la the below?

class String:
  def __init__(self, value, tag = None):
      self.value = string
      self.tag = tag
      def __getattr__(self, name):
          return name == self.tag
      def __str__(self):
          return self.value

Jelleas · 2018-08-16T15:49:01Z

Oh! We could do something like that!

We can register a so-called multi-constructor (matching a prefix of a tag). That would just wrap every tagged string with a custom String (better name pending) class. Then the interface could just be:

config = lib50.Loader("require", "include", "exclude").load("<YAML_CONTENT>", "<TOOL>")

We don't even need to hack in __getattr__ if we require the author to specify the tags beforehand. We can simply set those tags to False by default.

cmlsharp · 2018-08-16T15:53:15Z

The __getattr__ trick does feel pretty hackish to me. Why not just do if string.tag == "foo" instead of if string.foo? It's such a small syntactic difference and makes it more clear to anyone reading it. It should be noted though that not all strings will be of this type, only strings that are tagged (which I think is a feature, but does seem slightly different than David's suggestion).

dmalan · 2018-08-16T16:06:55Z

not all strings will be of this type

Would it not be safer to wrap all str values, since tags themselves, per Jelle's note, can appear globally in a YAML file? And it'd definitely be annoying to have to adopt a convention of

if hasattr(s, "tag")

before accessing s.tag, no?

Jelleas · 2018-08-16T16:19:03Z

You're probably not checking str.tag at every part of the config though, only where you expect values to be tagged (like in the key files)? Seems weird to then want to wrap everything. So I agree with Chad, this is a feature!

That said we don't need __getattr__ in either case, because for the sake of lib50 validating the config, we need to know all tags beforehand anyway. So we know what tags can possibly exist, and can thus properly instantiate them on the object.

Coming back to my point earlier, since lib50.load() effectively serves as higher level (.cs50.yaml) specific parser. We could support local tags by parsing the whole .yaml file once, then parsing the parts where we expect tags individually again. This would allow us to raise more specific errors, and avoid tag conflicts.

dmalan · 2018-08-16T16:46:38Z

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:

lab50:
  files:
    - foo.c
    - foo.h

And it seems silly to require instead

lab50:
  files:
    - !opened foo.c
    - !opened foo.h

for the common case. So I'd envision a common scenario of, e.g.:

lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html

or, equivalently:

lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html

How best to accommodate optionality, then, without forcing use of hasattr or such?

Jelleas · 2018-08-16T17:16:40Z

I'd envision a flag in the loader (for that scope if we support local tag scope) something like default=True, that wraps everything in a tagless String by default. And causes lib50 to not throw an error in case of a missing tag (we want to throw an error in check/submit50).

So for lab50 it could be:

lib50.Loader("closed", default=True).load(...)

Or:

loader = lib50.Loader()
loader.scope("files", "closed", default=True)
loader.load(...)

Edit:
Could also support a default tag, default="opened"

dmalan · 2018-08-16T18:18:48Z

Yeah, I think values can only have one tag, right? If so, setting a default tag is probably better than a default value, since (accidental) code like

lib50.Loader("closed", default=True).load(...)
lib50.Loader("opened", default=True).load(...)

would seem to create an ambiguity otherwise?

Jelleas · 2018-08-17T04:45:51Z

Implemented a working concept of Loader in lib50.config

lib50/lib50/config.py

Line 26 in 7250882

class Loader:

Unittests here:

lib50/tests/config_tests.py

Line 6 in 7250882

class TestLoader(unittest.TestCase):

Interface is as follows:

loader = lib50.Loader("<TOOL>", "<GLOBAL_TAG_1>", "<GLOBAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.scope("<KEY>", "<LOCAL_TAG_1>", "<LOCAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.load(content)

For check50 we would do:

loader = lib50.Loader("check50")
loader.scope("files', "include", "exclude", "require")
loader.load(content)

For lab50:

loader = lib50.Loader("lab50")
loader.scope("files", "closed", default="opened") 
# or if !opened needs to be a valid tag
# loader.scope("files", "closed", "opened", default="opened") 
loader.load(content)

Currently you can only scope keys exactly one level deep (top-level is reserved for tools). That means you can scope keys like files, dependencies etc. Reason I did this was to avoid any complications with nesting scopes, or keys occurring multiple times in the YAML file.

As a result from .load() you get the same result as before, however every tagged value (or where a default was applied) is now an instance of TaggedValue. That has a tag, tags and value attribute, and a boolean attribute for every tag so that you can safely access tagged_value.<any_valid_tag> without having to check hasattr first.

I really like scopes here even in the case of check50, as we can now raise errors from lib50 if you tag a value where we're not expecting tags.

ps. I have not integrated any of this with lib50.files() yet, waiting for feedback :)

Jelleas · 2019-06-14T19:49:26Z

Here is how we currently envision lab50 can use lib50.files.

lib50/tests/lib50_tests.py

Lines 433 to 471 in 7356e65

    
           def test_lab50_tags(self): 
        
               # Four dummy files 
        
               open("foo.py", "w").close() 
        
               open("bar.py", "w").close() 
        
               open("baz.py", "w").close() 
        
               open("qux.py", "w").close() 
        
               # Dummy config file (.cs50.yml) 
        
               content = \ 
        
                   "lab50:\n" \ 
        
                   "  files:\n" \ 
        
                   "    - !open \"foo.py\"\n" \ 
        
                   "    - !include \"bar.py\"\n" \ 
        
                   "    - !exclude \"baz.py\"\n" \ 
        
                   "    - \"qux.py\"\n" 
        
               # Create a config Loader for a tool called lab50 
        
               loader = lib50.config.Loader("lab50") 
        
               # Scope the files section of lab50 with the tags: open, include and exclude 
        
               loader.scope("files", "open", "include", "exclude", default="include") 
        
               # Load the config 
        
               config = loader.load(content) 
        
               # Figure out which files have an open tag 
        
               opened_files = [tagged_value.value for tagged_value in config["files"] if tagged_value.tag == "open"] 
        
               # Have lib50.files figure out which files should be included and excluded 
        
               # Simultaneously ensure all open files exist 
        
               included, excluded = lib50.files(config["files"], require_tags=["open"]) 
        
               # Make sure that files tagged with open are also included 
        
               opened_files = [file for file in opened_files if file in included] 
        
               # Assert 
        
               self.assertEqual(included, {"foo.py", "bar.py", "qux.py"}) 
        
               self.assertEqual(excluded, {"baz.py"}) 
        
               self.assertEqual(set(opened_files), {"foo.py"})

dmalan · 2019-06-14T19:51:33Z

Few questions:

What's the difference between !open and !include?
What does a filename without a tag imply?

Jelleas · 2019-06-14T20:12:02Z

!include would be, just include whatever matches the pattern, but don't open them. Whereas !open includes this specific file and opens it. This behavior is completely customizable.
I set a default tag to !include. If a tag is missing, it gets tagged with !include.

lib50/tests/lib50_tests.py

Line 453 in 7356e65

loader.scope("files", "open", "include", "exclude", default="include")

Implemented the default behavior for this comment from last year (#8 (comment)):

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:
lab50:
  files:
    - foo.c
    - foo.h
And it seems silly to require instead
lab50:
  files:
    - !opened foo.c
    - !opened foo.h
for the common case. So I'd envision a common scenario of, e.g.:
lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html
or, equivalently:
lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html
How best to accommodate optionality, then, without forcing use of hasattr or such?

dmalan added the question Further information is requested label Aug 13, 2018

dmalan assigned Jelleas and cmlsharp Aug 13, 2018

Jelleas mentioned this issue Sep 14, 2018

lib50.files returns all files when arg is empty list #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how handle use of `files` in lab50? #8

how handle use of `files` in lab50? #8

dmalan commented Aug 13, 2018

Jelleas commented Aug 13, 2018

dmalan commented Aug 13, 2018 •

edited

Loading

cmlsharp commented Aug 13, 2018

dmalan commented Aug 13, 2018

cmlsharp commented Aug 13, 2018 •

edited

Loading

dmalan commented Aug 16, 2018

Jelleas commented Aug 16, 2018

dmalan commented Aug 16, 2018 •

edited

Loading

Jelleas commented Aug 16, 2018

cmlsharp commented Aug 16, 2018

dmalan commented Aug 16, 2018 •

edited

Loading

Jelleas commented Aug 16, 2018

dmalan commented Aug 16, 2018

Jelleas commented Aug 16, 2018 •

edited

Loading

dmalan commented Aug 16, 2018

Jelleas commented Aug 17, 2018

Jelleas commented Jun 14, 2019

dmalan commented Jun 14, 2019

Jelleas commented Jun 14, 2019 •

edited

Loading

how handle use of files in lab50? #8

how handle use of files in lab50? #8

Comments

dmalan commented Aug 13, 2018

Jelleas commented Aug 13, 2018

dmalan commented Aug 13, 2018 • edited Loading

cmlsharp commented Aug 13, 2018

dmalan commented Aug 13, 2018

cmlsharp commented Aug 13, 2018 • edited Loading

dmalan commented Aug 16, 2018

Jelleas commented Aug 16, 2018

dmalan commented Aug 16, 2018 • edited Loading

Jelleas commented Aug 16, 2018

cmlsharp commented Aug 16, 2018

dmalan commented Aug 16, 2018 • edited Loading

Jelleas commented Aug 16, 2018

dmalan commented Aug 16, 2018

Jelleas commented Aug 16, 2018 • edited Loading

dmalan commented Aug 16, 2018

Jelleas commented Aug 17, 2018

Jelleas commented Jun 14, 2019

dmalan commented Jun 14, 2019

Jelleas commented Jun 14, 2019 • edited Loading

how handle use of `files` in lab50? #8

how handle use of `files` in lab50? #8

dmalan commented Aug 13, 2018 •

edited

Loading

cmlsharp commented Aug 13, 2018 •

edited

Loading

dmalan commented Aug 16, 2018 •

edited

Loading

dmalan commented Aug 16, 2018 •

edited

Loading

Jelleas commented Aug 16, 2018 •

edited

Loading

Jelleas commented Jun 14, 2019 •

edited

Loading