Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implementation #1

Merged
merged 44 commits into from
Feb 5, 2025
Merged

feat: implementation #1

merged 44 commits into from
Feb 5, 2025

Conversation

MQ37
Copy link
Collaborator

@MQ37 MQ37 commented Jan 31, 2025

This is a new integration package for LangChain that integrates Apify. This is the new recommended way instead of using the LangChain community.

This project was bootstrapped using the langchain-cli template for integration packages.

  • Implemented LangChain tool to call Apify Actors
  • Ported ApifyDatasetLoader from langchain community
  • Ported ApifyWrapper from langchain community
  • Wrote basic unit and integration tests - see DEVELOPMENT.md file

@MQ37 MQ37 self-assigned this Jan 31, 2025
@MQ37 MQ37 requested a review from jirimoravcik February 2, 2025 19:41
@MQ37 MQ37 requested a review from jirispilka February 3, 2025 10:21
@MQ37 MQ37 requested a review from jirimoravcik February 3, 2025 13:47
Copy link
Member

@jirimoravcik jirimoravcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Kuba @MQ37,
first, thanks for the pull request, it's a lot of work and it looks great.
I have some comments to improve the code:

  • At Apify, we write the word "Actor" capitalized in any places where it's text (and doesn't need to follow things like snake_case)
  • I've noticed you like to use msg = 'error message' and then raise ValueError(msg). I personally feel like it's not necessary to separate the statements like that.
  • I see that you use double quotes in your code. I believe single quotes would be better, e.g. see configuration from our Python SDK https://github.com/apify/apify-sdk-python/blob/master/pyproject.toml#L115
  • I noticed some inconsistencies between how you formatted function arguments, some have the :param notation, some not. It would be nice to unify.
  • It would be nice to have a README file that gives users more insight into what the package and Apify can do for them

Thank you

@MQ37
Copy link
Collaborator Author

MQ37 commented Feb 4, 2025

Hey Kuba @MQ37, first, thanks for the pull request, it's a lot of work and it looks great. I have some comments to improve the code:

* At Apify, we write the word "Actor" capitalized in any places where it's text (and doesn't need to follow things like `snake_case`)

* I've noticed you like to use `msg = 'error message'` and then `raise ValueError(msg)`. I personally feel like it's not necessary to separate the statements like that.

* I see that you use double quotes in your code. I believe single quotes would be better, e.g. see configuration from our Python SDK https://github.com/apify/apify-sdk-python/blob/master/pyproject.toml#L115

* I noticed some inconsistencies between how you formatted function arguments, some have the `:param` notation, some not. It would be nice to unify.

* It would be nice to have a README file that gives users more insight into what the package and Apify can do for them

Thank you

Hey Jirka @jirimoravcik, thank you for your valuable review 👍

  • capitalized the word Actor, it should not appear in lowercase form anywhere in the text, thank you for noticing this
  • kept msg and raise separate instead of using literal, thank you for the personal suggestion but I think ruff docs explanation makes sense - https://docs.astral.sh/ruff/rules/raw-string-in-exception/
  • changed the ruff config to use single quotes and added additional rules
  • fixed the inconsistencies and set google pydoc style convention, so all docstrings should follow this convention
  • added README header and improved based on your suggestions, thank you (also changed the repo description)

I'd even go further and reach out to marketing to help write a punchy paragraph here, I think it's worth it

Do you know who I can reach to help me with improving the README header?

Thank you for your time and effort 👍

@MQ37 MQ37 requested a review from jirimoravcik February 4, 2025 13:13
@jirimoravcik
Copy link
Member

Do you know who I can reach to help me with improving the README header?

Try #apify-content-tasks Slack channel

msg = f'Failed to get the Actor object ID for {actor_id}.'
raise ValueError(msg)

url = f'https://api.apify.com/v2/acts/{actor_obj_id}/builds/default'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpoletaev will you be adding this to the Python client please?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will not be added to the python/js client unless really needed. Because it will be breaking change.

Currently for BuildClients we use /actor-builds namespace instead of /acts namespace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it won't be accessible at all via cilents? That doesn't feel right. Is there some discussion I could read? Thank you

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the link to zenhub task: https://app.zenhub.com/workspaces/platform-team-5f6454160d9f82000fa6733f/issues/gh/apify/apify-core/18314

"adding the /default subroute to the API client would require some major restructuring, since currently ActorClient.build() builds the Actor, it does not retrieve a specific build"

Copy link
Member

@jirimoravcik jirimoravcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

MQ37 and others added 2 commits February 4, 2025 19:17
Co-authored-by: Jiří Moravčík <jiri.moravcik@gmail.com>
Copy link
Contributor

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thanks!

A couple of notes from me:

  • Github repo about: please provide text there, this is searchable field (not this Apify + 🦜🔗 = 🤖🔧🌐)
  • use are missing line-length in pyproject.toml. I would suggest to use 120
[tool.ruff]
line-length = 120
  • Again, some of the issues I saw should be reported by IDE. Like that you are using a variable in doc that is not in a function definition

Otherwise, good job 👍🏻

T = TypeVar('T', ApifyClient, ApifyClientAsync)


def create_apify_client(t: type[T], token: str) -> T:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use just variable t. Never.
Also. why to create typevar, what is benefit over

def create_apify_client(apify_client: ApifyClient | ApifyClientAsync, token: str) -> ApifyClient | ApifyClientAsync:

I know, it is a bit longer, but readable

Copy link
Collaborator Author

@MQ37 MQ37 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use just variable t. Never.

makes sense, this is unreadable for someone who did not write the code. Renamed to client_cls

Also. why to create typevar, what is benefit over

TypeVar is needed when you need to work with a single type in the scope. If simple union was used then you would have to always check the client type using isinstanceof and after that you could work with the client, otherwise type checker would be angry - this way you can just pass the client type class that you want to create (sync or async) and since it is typevar the type checker knows that if you passed ApifyClient it has to return ApifyClient and cannot return ApifyClientAsync and vice versa

MQ37 and others added 12 commits February 5, 2025 09:25
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
@MQ37
Copy link
Collaborator Author

MQ37 commented Feb 5, 2025

I like it, thanks!

A couple of notes from me:

* Github repo about: please provide text there, this is searchable field (not this Apify + 🦜🔗 = 🤖🔧🌐)

* use are missing line-length in pyproject.toml. I would suggest to use 120
[tool.ruff]
line-length = 120
* Again, some of the issues I saw should be reported by IDE. Like that you are using a variable in doc that is not in a function definition

Otherwise, good job 👍🏻

Thank you for your review and effort 👍

  • refactored the code
  • changed the repo description to "Apify integration for LangChain 🦜🔗 "
  • set the ruff line length to 120
  • found out when you run ruff in preview mode it reports these issues (for example the docstring issues and method could be static) - will check the code also in preview mode

@MQ37 MQ37 requested a review from jirispilka February 5, 2025 10:39
Copy link
Contributor

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good

@MQ37 MQ37 merged commit 0163fce into main Feb 5, 2025
6 checks passed
@MQ37 MQ37 deleted the feat/implementation branch February 5, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants