-
Notifications
You must be signed in to change notification settings - Fork 6.2k
feat: Add JSON field extraction and enhanced URL validation #6051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…on by using dictionary unpacking instead of manual key-value pairs
♻️ (url.py): refactor JSON URL validation for better readability and consistency
…LComponent class ♻️ (url.py): refactor ensure_url method to simplify logic and improve readability 🐛 (url.py): fix error handling in URLComponent class for invalid JSON content
🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability
italojohnny
approved these changes
Feb 26, 2025
github-merge-queue bot
pushed a commit
that referenced
this pull request
Feb 26, 2025
* URL component improvement - JSON URL * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * ♻️ (url.py): refactor URLComponent class to simplify data_dict creation by using dictionary unpacking instead of manual key-value pairs * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * 📝 (url.py): improve formatting of info string for DropdownInput in URLComponent class ♻️ (url.py): refactor ensure_url method to simplify logic and improve readability 🐛 (url.py): fix error handling in URLComponent class for invalid JSON content * [autofix.ci] apply automated fixes * ✨ (url.py): Add BoolInput and StrInput to support new features in URLComponent 📝 (url.py): Update description in URLComponent to provide more detailed information about its functionality ♻️ (url.py): Refactor update_build_config method in URLComponent to dynamically update fields based on selected format 🐛 (url.py): Fix ensure_url method in URLComponent to ensure valid URLs are provided and handle exceptions properly 🐛 (url.py): Fix fetch_content method in URLComponent to handle cases where no valid URLs are provided and improve error handling 🐛 (url.py): Fix fetch_content_text method in URLComponent to correctly format and clean text output based on selected format and settings 🐛 (url.py): Fix as_dataframe method in URLComponent to return fetched content as a DataFrame object * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * ♻️ (url.py): remove unnecessary comments and improve code readability by removing redundant comments and adjusting code structure. * [autofix.ci] apply automated fixes * 📝 (url.py): improve readability by splitting long description and info strings into multiple lines 🐛 (url.py): handle cases where invalid URLs or JSON URLs are provided, and provide informative error messages 🐛 (url.py): handle cases where no valid URLs are provided and raise an error with a clear message * 🔧 (Blog Writer.json, Custom Component Maker.json, Graph Vector Store RAG.json): resolve merge conflicts in JSON files related to the 'format' field options to ensure consistency across starter projects. * [autofix.ci] apply automated fixes * 🐛 (url.py): fix validation of JSON content from URLs to ensure correct handling of JSON data ✨ (url.py): introduce async validation of JSON content from URLs using aiohttp to improve performance and reliability * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * merge fix * ✅ (test_audio_file.wav): update test_audio_file.wav to fix binary file differences in the test asset * 🐛 (test_url_component.py): update error message format to improve clarity and consistency * update templates * 🐛 (test_database.py): fix error handling in test_read_flows_components_only_paginated to properly catch and log exceptions during test execution * 📝 (backend): Add noqa comments to files to ignore specific linting rule A005 🔧 (test_database.py): Remove duplicate import statement for sqlalchemy ♻️ (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability * 🐛 (test_database.py): fix test_read_flows_components_only_paginated to handle exceptions and provide more context in case of failure * 📝 (test_database.py): remove unnecessary comment to improve code readability and maintainability * 📝 (backend): Remove unnecessary noqa comments from __init__.py files 🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Yukiyukiyeah
pushed a commit
that referenced
this pull request
Mar 31, 2025
* URL component improvement - JSON URL * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * ♻️ (url.py): refactor URLComponent class to simplify data_dict creation by using dictionary unpacking instead of manual key-value pairs * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * 📝 (url.py): improve formatting of info string for DropdownInput in URLComponent class ♻️ (url.py): refactor ensure_url method to simplify logic and improve readability 🐛 (url.py): fix error handling in URLComponent class for invalid JSON content * [autofix.ci] apply automated fixes * ✨ (url.py): Add BoolInput and StrInput to support new features in URLComponent 📝 (url.py): Update description in URLComponent to provide more detailed information about its functionality ♻️ (url.py): Refactor update_build_config method in URLComponent to dynamically update fields based on selected format 🐛 (url.py): Fix ensure_url method in URLComponent to ensure valid URLs are provided and handle exceptions properly 🐛 (url.py): Fix fetch_content method in URLComponent to handle cases where no valid URLs are provided and improve error handling 🐛 (url.py): Fix fetch_content_text method in URLComponent to correctly format and clean text output based on selected format and settings 🐛 (url.py): Fix as_dataframe method in URLComponent to return fetched content as a DataFrame object * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * ♻️ (url.py): remove unnecessary comments and improve code readability by removing redundant comments and adjusting code structure. * [autofix.ci] apply automated fixes * 📝 (url.py): improve readability by splitting long description and info strings into multiple lines 🐛 (url.py): handle cases where invalid URLs or JSON URLs are provided, and provide informative error messages 🐛 (url.py): handle cases where no valid URLs are provided and raise an error with a clear message * 🔧 (Blog Writer.json, Custom Component Maker.json, Graph Vector Store RAG.json): resolve merge conflicts in JSON files related to the 'format' field options to ensure consistency across starter projects. * [autofix.ci] apply automated fixes * 🐛 (url.py): fix validation of JSON content from URLs to ensure correct handling of JSON data ✨ (url.py): introduce async validation of JSON content from URLs using aiohttp to improve performance and reliability * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * merge fix * ✅ (test_audio_file.wav): update test_audio_file.wav to fix binary file differences in the test asset * 🐛 (test_url_component.py): update error message format to improve clarity and consistency * update templates * 🐛 (test_database.py): fix error handling in test_read_flows_components_only_paginated to properly catch and log exceptions during test execution * 📝 (backend): Add noqa comments to files to ignore specific linting rule A005 🔧 (test_database.py): Remove duplicate import statement for sqlalchemy ♻️ (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability * 🐛 (test_database.py): fix test_read_flows_components_only_paginated to handle exceptions and provide more context in case of failure * 📝 (test_database.py): remove unnecessary comment to improve code readability and maintainability * 📝 (backend): Remove unnecessary noqa comments from __init__.py files 🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
lgtm
This PR has been approved by a maintainer
size:XL
This PR changes 500-999 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces enhancements to the
URLComponent
class in thesrc/backend/base/langflow/components/data/url.py
file, primarily focusing on adding support for JSON format extraction from HTML content. The most important changes include importing thejson
module, updating the output format options, and implementing additional logic to handle JSON content.Enhancements to the
URLComponent
class:src/backend/base/langflow/components/data/url.py
: Imported thejson
module to handle JSON content.class URLComponent(Component)
: Updated theDropdownInput
options to include "JSON" and modified theinfo
attribute to reflect the new option.def ensure_url(self, string: str) -> str
: Added validation to ensure URLs ending with ".json" when the format is set to "JSON".def fetch_content(self) -> list[Data]
: Implemented logic to parse and validate JSON content, and to structure the data accordingly.