Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji within Sling stdout causing failures #27239

Open
TCronino opened this issue Jan 21, 2025 · 1 comment
Open

Emoji within Sling stdout causing failures #27239

TCronino opened this issue Jan 21, 2025 · 1 comment
Assignees
Labels
type: bug Something isn't working

Comments

@TCronino
Copy link

What's the issue?

Discussed here in Dagster Slack - https://dagster.slack.com/archives/C06LQ9H1064/p1737370879891879

Occasionally, the Sling output contains a message containing an emoji - I'm not sure of the pattern of why it only sometimes appears...but its often enough that it has been causing failures in my pipelines

The error seen is:
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f449' in position 25: character maps to <undefined>

What did you expect to happen?

Can any emojis be stripped from the output to ensure this failure doesn't occur?

Assume this sort of approach in the SlingResource would resolve the issue? (note i've not tested this....just a suggestion!)

import re

def _process_stdout(self, stdout: IO[AnyStr], encoding="utf8") -> Iterator[str]:
        """Process stdout from the Sling CLI."""
        emoji_pattern = re.compile(
            "["
            "\U0001F600-\U0001F64F"  # emoticons
            "\U0001F300-\U0001F5FF"  # symbols & pictographs
            "\U0001F680-\U0001F6FF"  # transport & map symbols
            "\U0001F700-\U0001F77F"  # alchemical symbols
            "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
            "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
            "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
            "\U0001FA00-\U0001FA6F"  # Chess Symbols
            "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
            "\U00002702-\U000027B0"  # Dingbats
            "\U000024C2-\U0001F251" 
            "]+", flags=re.UNICODE
        )
        
        for line in stdout:
            assert isinstance(line, bytes)
            fmt_line = bytes.decode(line, encoding=encoding, errors="replace")
            fmt_line = emoji_pattern.sub(r'', fmt_line)
            yield self._clean_line(fmt_line)

How to reproduce?

When running a large number of sling resources i found that maybe 1/20 ish seemed to have the message appear

https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/sling_media.go

Dagster version

1.9.8

Deployment type

Local

Deployment details

Only observed in local, didn't try pushing to dagster+

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@TCronino
Copy link
Author

Heads up that it shouldn't be an issue following the next Sling release as they will remove the emoji
slingdata-io/sling-cli#490

However, probably still a good issue to solve anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants