Skip to content

feat: Add JSON field extraction and enhanced URL validation #6051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Mar 10, 2025

Conversation

Cristhianzl
Copy link
Member

This pull request introduces enhancements to the URLComponent class in the src/backend/base/langflow/components/data/url.py file, primarily focusing on adding support for JSON format extraction from HTML content. The most important changes include importing the json module, updating the output format options, and implementing additional logic to handle JSON content.

Enhancements to the URLComponent class:

@Cristhianzl Cristhianzl self-assigned this Jan 31, 2025
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Jan 31, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
…on by using dictionary unpacking instead of manual key-value pairs
♻️ (url.py): refactor JSON URL validation for better readability and consistency
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 31, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jan 31, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 31, 2025
…LComponent class

♻️ (url.py): refactor ensure_url method to simplify logic and improve readability
🐛 (url.py): fix error handling in URLComponent class for invalid JSON content
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 1, 2025
🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025
@Cristhianzl Cristhianzl added lgtm This PR has been approved by a maintainer and removed lgtm This PR has been approved by a maintainer labels Feb 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025
@italojohnny italojohnny self-requested a review February 26, 2025 13:01
@Cristhianzl Cristhianzl added lgtm This PR has been approved by a maintainer and removed lgtm This PR has been approved by a maintainer labels Feb 26, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025
@Cristhianzl Cristhianzl added this pull request to the merge queue Feb 26, 2025
github-merge-queue bot pushed a commit that referenced this pull request Feb 26, 2025
* URL component improvement - JSON URL

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* ♻️ (url.py): refactor URLComponent class to simplify data_dict creation by using dictionary unpacking instead of manual key-value pairs

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* 📝 (url.py): improve formatting of info string for DropdownInput in URLComponent class
♻️ (url.py): refactor ensure_url method to simplify logic and improve readability
🐛 (url.py): fix error handling in URLComponent class for invalid JSON content

* [autofix.ci] apply automated fixes

* ✨ (url.py): Add BoolInput and StrInput to support new features in URLComponent
📝 (url.py): Update description in URLComponent to provide more detailed information about its functionality
♻️ (url.py): Refactor update_build_config method in URLComponent to dynamically update fields based on selected format
🐛 (url.py): Fix ensure_url method in URLComponent to ensure valid URLs are provided and handle exceptions properly
🐛 (url.py): Fix fetch_content method in URLComponent to handle cases where no valid URLs are provided and improve error handling
🐛 (url.py): Fix fetch_content_text method in URLComponent to correctly format and clean text output based on selected format and settings
🐛 (url.py): Fix as_dataframe method in URLComponent to return fetched content as a DataFrame object

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* ♻️ (url.py): remove unnecessary comments and improve code readability by removing redundant comments and adjusting code structure.

* [autofix.ci] apply automated fixes

* 📝 (url.py): improve readability by splitting long description and info strings into multiple lines
🐛 (url.py): handle cases where invalid URLs or JSON URLs are provided, and provide informative error messages
🐛 (url.py): handle cases where no valid URLs are provided and raise an error with a clear message

* 🔧 (Blog Writer.json, Custom Component Maker.json, Graph Vector Store RAG.json): resolve merge conflicts in JSON files related to the 'format' field options to ensure consistency across starter projects.

* [autofix.ci] apply automated fixes

* 🐛 (url.py): fix validation of JSON content from URLs to ensure correct handling of JSON data
✨ (url.py): introduce async validation of JSON content from URLs using aiohttp to improve performance and reliability

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* merge fix

* ✅ (test_audio_file.wav): update test_audio_file.wav to fix binary file differences in the test asset

* 🐛 (test_url_component.py): update error message format to improve clarity and consistency

* update templates

* 🐛 (test_database.py): fix error handling in test_read_flows_components_only_paginated to properly catch and log exceptions during test execution

* 📝 (backend): Add noqa comments to files to ignore specific linting rule A005
🔧 (test_database.py): Remove duplicate import statement for sqlalchemy
♻️ (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability

* 🐛 (test_database.py): fix test_read_flows_components_only_paginated to handle exceptions and provide more context in case of failure

* 📝 (test_database.py): remove unnecessary comment to improve code readability and maintainability

* 📝 (backend): Remove unnecessary noqa comments from __init__.py files
🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2025
@carlosrcoelho carlosrcoelho added this pull request to the merge queue Feb 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 27, 2025
@Cristhianzl Cristhianzl added this pull request to the merge queue Mar 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 5, 2025
@Cristhianzl Cristhianzl added this pull request to the merge queue Mar 10, 2025
Merged via the queue into main with commit 47753d3 Mar 10, 2025
37 checks passed
@Cristhianzl Cristhianzl deleted the cz/url-improve branch March 10, 2025 12:40
Yukiyukiyeah pushed a commit that referenced this pull request Mar 31, 2025
* URL component improvement - JSON URL

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* ♻️ (url.py): refactor URLComponent class to simplify data_dict creation by using dictionary unpacking instead of manual key-value pairs

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* 📝 (url.py): improve formatting of info string for DropdownInput in URLComponent class
♻️ (url.py): refactor ensure_url method to simplify logic and improve readability
🐛 (url.py): fix error handling in URLComponent class for invalid JSON content

* [autofix.ci] apply automated fixes

* ✨ (url.py): Add BoolInput and StrInput to support new features in URLComponent
📝 (url.py): Update description in URLComponent to provide more detailed information about its functionality
♻️ (url.py): Refactor update_build_config method in URLComponent to dynamically update fields based on selected format
🐛 (url.py): Fix ensure_url method in URLComponent to ensure valid URLs are provided and handle exceptions properly
🐛 (url.py): Fix fetch_content method in URLComponent to handle cases where no valid URLs are provided and improve error handling
🐛 (url.py): Fix fetch_content_text method in URLComponent to correctly format and clean text output based on selected format and settings
🐛 (url.py): Fix as_dataframe method in URLComponent to return fetched content as a DataFrame object

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* ♻️ (url.py): remove unnecessary comments and improve code readability by removing redundant comments and adjusting code structure.

* [autofix.ci] apply automated fixes

* 📝 (url.py): improve readability by splitting long description and info strings into multiple lines
🐛 (url.py): handle cases where invalid URLs or JSON URLs are provided, and provide informative error messages
🐛 (url.py): handle cases where no valid URLs are provided and raise an error with a clear message

* 🔧 (Blog Writer.json, Custom Component Maker.json, Graph Vector Store RAG.json): resolve merge conflicts in JSON files related to the 'format' field options to ensure consistency across starter projects.

* [autofix.ci] apply automated fixes

* 🐛 (url.py): fix validation of JSON content from URLs to ensure correct handling of JSON data
✨ (url.py): introduce async validation of JSON content from URLs using aiohttp to improve performance and reliability

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* merge fix

* ✅ (test_audio_file.wav): update test_audio_file.wav to fix binary file differences in the test asset

* 🐛 (test_url_component.py): update error message format to improve clarity and consistency

* update templates

* 🐛 (test_database.py): fix error handling in test_read_flows_components_only_paginated to properly catch and log exceptions during test execution

* 📝 (backend): Add noqa comments to files to ignore specific linting rule A005
🔧 (test_database.py): Remove duplicate import statement for sqlalchemy
♻️ (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability

* 🐛 (test_database.py): fix test_read_flows_components_only_paginated to handle exceptions and provide more context in case of failure

* 📝 (test_database.py): remove unnecessary comment to improve code readability and maintainability

* 📝 (backend): Remove unnecessary noqa comments from __init__.py files
🔧 (test_database.py): Refactor test_read_flows_components_only_paginated function for better readability and maintainability

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants