Skip to content

Copy all collections by default in Firestore to Firestore Dataflow Copy#3448

Open
pacoavila808 wants to merge 38 commits intoGoogleCloudPlatform:mainfrom
jingqizz:dataflow-copy-all-collections
Open

Copy all collections by default in Firestore to Firestore Dataflow Copy#3448
pacoavila808 wants to merge 38 commits intoGoogleCloudPlatform:mainfrom
jingqizz:dataflow-copy-all-collections

Conversation

@pacoavila808
Copy link

If no flag provided by user, copies all collections in database.

jingqizz and others added 30 commits February 20, 2026 17:15
… add a pipeline option to log errors to a given GCS path in the next commit.
…rom writes will be written into the gcs path.
…tract documents from the RunQueryResponse and step 6. write documents to destination db.
… Being consistent with DataStreamMongoDBToFirestore.java
…e function to return PCollectionTuple with tags.
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Dataflow template that enables efficient batch copying of data between Firestore databases. A significant enhancement is the template's ability to automatically discover and copy all collections from the source database if no specific collections are designated, making it a robust solution for full database replication or migration scenarios. The changes also include foundational components for testing and deploying Firestore-related Dataflow jobs.

Highlights

  • New Dataflow Template: Firestore to Firestore: Introduced a new Dataflow template designed to facilitate the batch copying of documents from one Firestore database to another. This template supports both default and specified collection groups for migration.
  • Default Collection Copying: The template now defaults to copying all collections within a source Firestore database if no specific collection group IDs are provided by the user, simplifying full database migrations.
  • Firestore Resource Management for Testing: Added dedicated resource managers (FirestoreAdminResourceManager and FirestoreResourceManager) and their associated exceptions to streamline the creation, management, and cleanup of Firestore databases and collections within integration tests.
  • Modularized Dataflow Transforms: Implemented new Apache Beam DoFn and PTransform classes (CreatePartitionQueryRequestFn, PrepareWritesFn, RunQueryResponseToDocumentFn) to handle the partitioning of Firestore queries, preparation of write operations, and extraction of documents from query responses, enhancing the template's modularity and testability.
  • Comprehensive Testing and Documentation: Included extensive integration tests (FirestoreToFirestoreIT) to validate the template's functionality, including fuzz testing for various data types and selective collection copying. Also added detailed README.md documentation and Terraform configurations for deployment.
Changelog
  • it/google-cloud-platform/pom.xml
    • Added google-cloud-firestore and google-cloud-firestore-admin dependencies to support Firestore operations.
  • it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/firestore/FirestoreAdminResourceManager.java
    • Added a new resource manager for creating, deleting, and managing Firestore administrative resources like databases.
  • it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/firestore/FirestoreAdminResourceManagerException.java
    • Added a custom exception class for errors encountered during Firestore administrative resource management.
  • it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/firestore/FirestoreResourceManager.java
    • Added a new resource manager for performing data operations (read, write, delete collections) on Firestore databases.
  • it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/firestore/FirestoreResourceManagerException.java
    • Added a custom exception class for errors encountered during Firestore data resource management.
  • it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/firestore/package-info.java
    • Added package-info documentation for the Firestore resource management classes.
  • v2/firestore-to-firestore/README_Cloud_Firestore_to_Firestore.md
    • Added comprehensive documentation for the new Firestore to Firestore Dataflow template, including parameters, getting started guide, and Terraform usage.
  • v2/firestore-to-firestore/pom.xml
    • Added a new Maven module for the firestore-to-firestore template, including necessary dependencies for Firestore and Beam.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/templates/FirestoreToFirestore.java
    • Added the main Java class for the Firestore to Firestore Dataflow template, implementing the pipeline logic for reading from source and writing to destination Firestore databases.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/templates/package-info.java
    • Added package-info documentation for the Firestore to Firestore template package.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/transforms/CreatePartitionQueryRequestFn.java
    • Added a PTransform to create Firestore PartitionQueryRequest objects from collection IDs, enabling parallelized reads.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/transforms/PrepareWritesFn.java
    • Added a DoFn to transform source Firestore Document objects into Write requests suitable for the destination database, adjusting document paths as needed.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/transforms/RunQueryResponseToDocumentFn.java
    • Added a DoFn to extract Document objects from Firestore RunQueryResponse messages.
  • v2/firestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/transforms/package-info.java
    • Added package-info documentation for the Firestore transforms package.
  • v2/firestore-to-firestore/src/test/java/com/google/cloud/teleport/v2/templates/FirestoreToFirestoreIT.java
    • Added integration tests for the Firestore to Firestore template, covering selective collection copying and fuzz testing with various data types.
  • v2/firestore-to-firestore/src/test/java/com/google/cloud/teleport/v2/transforms/CreatePartitionQueryRequestFnTest.java
    • Added unit tests for the CreatePartitionQueryRequestFn to verify correct generation of partition queries.
  • v2/firestore-to-firestore/src/test/java/com/google/cloud/teleport/v2/transforms/PrepareWritesFnTest.java
    • Added unit tests for the PrepareWritesFn to ensure accurate document path transformation and field preservation.
  • v2/firestore-to-firestore/src/test/java/com/google/cloud/teleport/v2/transforms/RunQueryResponseToDocumentFnTest.java
    • Added unit tests for the RunQueryResponseToDocumentFn to confirm proper document extraction from query responses.
  • v2/firestore-to-firestore/terraform/Cloud_Firestore_to_Firestore/dataflow_job.tf
    • Added Terraform configuration for deploying the Firestore to Firestore Dataflow template.
  • v2/pom.xml
    • Added the new firestore-to-firestore module to the parent pom.xml.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants