Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci,server,worker): implement schemata and items copy #1331

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

nourbalaha
Copy link
Contributor

@nourbalaha nourbalaha commented Dec 3, 2024

Overview

This PR implements schemata and items copy

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new endpoint to copy models, allowing users to duplicate schemas and items.
    • Added functionality for a copier service to manage copying operations within the application.
  • Improvements

    • Enhanced CI/CD workflows for building and pushing Docker images.
    • Updated API documentation to reflect new capabilities and naming conventions.
    • Streamlined repository management by transitioning to a repository-based approach.
  • Bug Fixes

    • Improved error handling in various components related to copying functionality.
  • Tests

    • Added comprehensive tests for the new copying features and updated existing tests for better coverage.

@nourbalaha nourbalaha self-assigned this Dec 3, 2024
Copy link

coderabbitai bot commented Dec 3, 2024

Walkthrough

This pull request introduces a comprehensive feature for copying models in the ReEarth CMS system. The changes span multiple components, including server-side API endpoints, worker infrastructure, and testing suites. A new workflow for building a copier Docker image has been added, and the system now supports copying models with their associated schemas and items. The implementation includes modifications to repositories, use cases, and integration API specifications to enable this functionality across different layers of the application.

Changes

File Change Summary
.github/workflows/build_copier.yml New GitHub Actions workflow for building and pushing Docker image for ReEarth CMS copier
.github/workflows/build_worker.yml Added docker_copier job for building copier Docker image
server/e2e/integration_model_test.go Added integration test for model copy functionality
server/internal/adapter/integration/model.go Implemented CopyModel method for server-side model copying
server/internal/adapter/integration/server.gen.go Updated server interface and generated code for model copy endpoint
server/internal/usecase/interactor/model.go Added complex logic for copying models, schemas, and items
server/internal/usecase/interfaces/model.go Defined CopyModelParam and updated model interface
worker/cmd/copier/main.go Implemented main entry point for copier service
worker/internal/infrastructure/mongo/copier.go Added MongoDB copier implementation with document copying logic
worker/internal/usecase/interactor/copier.go Implemented use case for copying documents
server/internal/infrastructure/gcp/config.go Added CopierImage field to TaskConfig struct
server/internal/infrastructure/gcp/taskrunner.go Introduced copy method for handling copy tasks
worker/internal/adapter/http/copy.go Created CopyController for managing copy operations
worker/internal/app/main.go Refactored initialization logic for repositories and gateways
worker/internal/app/repo.go Updated repository initialization to return both gateways and repositories
worker/internal/usecase/repo/container.go Introduced Container struct for managing repositories
worker/internal/usecase/repo/copier.go Defined Copier interface for copy operations

Sequence Diagram

sequenceDiagram
    participant Client
    participant ModelController
    participant ModelInteractor
    participant ModelRepository
    participant TaskRunner
    participant CopierWorker

    Client->>ModelController: POST /models/{modelId}/copy
    ModelController->>ModelInteractor: Copy model
    ModelInteractor->>ModelRepository: Create copied model
    ModelInteractor->>ModelRepository: Copy schema
    ModelInteractor->>ModelRepository: Copy items
    ModelInteractor->>TaskRunner: Trigger copy event
    TaskRunner->>CopierWorker: Execute copy task
    CopierWorker->>ModelRepository: Copy documents
    CopierWorker-->>TaskRunner: Task completed
    ModelInteractor-->>ModelController: Return copied model
    ModelController-->>Client: 200 OK with copied model
Loading

Poem

🐰 Hop, hop, copy with glee!
A model's journey, now set free
From schema to items, all in line
Duplicating data, simply divine
ReEarth CMS, magic unfurled! 🚀

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the server label Dec 3, 2024
Copy link

netlify bot commented Dec 3, 2024

Deploy Preview for reearth-cms canceled.

Name Link
🔨 Latest commit e80cf07
🔍 Latest deploy log https://app.netlify.com/sites/reearth-cms/deploys/6768e5c02515600008009ef6

@nourbalaha nourbalaha changed the title feat(server): implement schemata and items copy feat(server, worker): implement schemata and items copy Dec 10, 2024
@nourbalaha nourbalaha changed the title feat(server, worker): implement schemata and items copy feat(server,worker): implement schemata and items copy Dec 10, 2024
@nourbalaha nourbalaha changed the title feat(server,worker): implement schemata and items copy feat(ci,server,worker): implement schemata and items copy Dec 11, 2024
@nourbalaha nourbalaha marked this pull request as ready for review December 20, 2024 06:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🧹 Nitpick comments (32)
worker/internal/app/main.go (2)

29-30: Combine comments for clarity or remove extraneous comment.
The comment “// repos and gateways” adds context, but consider grouping lines 29-30 under a single explanatory comment if more logic is added in the future.


42-42: Use a descriptive field name in ServerConfig if “Repos” might expand.
Currently, “Repos” is clear enough, but if different repository sets are introduced down the line, consider distinguishing them or grouping them in nested fields.

worker/internal/app/repo.go (1)

19-19: Confirm naming convention for init functions.
“initReposAndGateways” is descriptive. Ensure it’s not confused with Go’s built-in init functions.

worker/internal/infrastructure/mongo/copier.go (3)

44-60: Index listing logic might become expensive for large collections.
Listing and decoding each index can be costly if this function runs frequently. Consider caching or a more direct approach if frequent re-invocation is expected.


61-73: Index creation for expiring documents.
SetExpireAfterSeconds suggests these documents expire after 24 hours. Verify that this meets business requirements or consider making it configurable.


86-93: Check memory usage with large result sets.
Using Find and reading all documents could be memory-intensive if the filter is broad. Consider streaming or chunk-based copying if applicable.

server/internal/usecase/interactor/model.go (4)

5-5: Import of encoding/json.
Appropriate for copy logic. Ensure that repeated JSON marshalling/unmarshalling does not hamper performance in tight loops.


7-7: Ensure fmt usage is minimal in production code.
We do see usage in triggerCopyEvent. Evaluate if structured logging might be more consistent in some scenarios.


390-413: Metadata schema creation.
The logic to clone the old meta schema is well structured and updates the new model’s metadata pointer. Confirm no references to the old model’s ID remain in the new schema.


440-458: Asynchronous copy triggered.
The gateway’s TaskRunner is used to queue a copy event. Logging is minimal, but sufficient. If copy processes are large, consider adding monitoring or progress logs.

server/e2e/integration_model_test.go (4)

60-62: Add concise test documentation.
A brief comment at the start of the test function about what scenarios it validates would help future maintainers quickly discern its purpose.


63-78: Increase negative test coverage.
In addition to unauthorized requests, consider adding test cases for invalid payloads or non-existent model IDs to cover edge scenarios (e.g., model ID is syntactically valid but does not exist).


79-86: Move test data retrieval closer to usage.
For clarity and maintainability, consider retrieving the old model ID and object just before it’s used, keeping the test logic more localized. This helps future maintainers see data context in a single glance.


107-140: Ensure metadata schema relationships remain consistent.
You correctly validate that the copied metadata schema differs from the original. However, for completeness, you might also ensure that any references (if any) are updated or resolved consistently in your system (e.g., if the metadata schema references yet another entity).

worker/internal/usecase/repo/copier.go (1)

1-9: Establish richly documented interface.
Add doc comments to clarify possible implementations of the “Copier” interface, explaining expectations regarding data integrity, performance constraints, and error handling.

worker/internal/usecase/interactor/copier.go (1)

7-9: Add contextual logging or tracing.
When calling “u.repos.Copier.Copy”, consider adding logs or distributed tracing to help diagnose potential data consistency issues or performance bottlenecks, especially as copy operations can be data-intensive.

worker/internal/usecase/interactor/usecase.go (1)

10-10: Rename field for clarity.
The field “repos” is clear in context, but to improve readability, you might consider a name like “repositories” or “repoContainer” to make the usage more self-descriptive.

worker/internal/adapter/http/copy.go (1)

19-22: Add documentation for CopyInput fields

The Filter and Changes fields lack documentation explaining their purpose, format, and expected values. This is important for maintainability and API understanding.

 type CopyInput struct {
+    // Filter specifies the MongoDB query filter in JSON format
     Filter  string `json:"filter"`
+    // Changes specifies the modifications to be applied in JSON format
     Changes string `json:"changes"`
 }
worker/internal/usecase/interactor/webhook.go (1)

Line range hint 13-19: Consider adding retry mechanism for concurrent webhook processing

The GetAndSet operation might have race conditions in a distributed environment. Consider implementing a retry mechanism with exponential backoff:

+	maxRetries := 3
+	for i := 0; i < maxRetries; i++ {
 		found, err := u.repos.Webhook.GetAndSet(ctx, eid)
 		if err != nil {
 			log.Errorf("webhook usecase: failed to get webhook sent: %v", err)
+			continue
 		}
 		if found {
 			return nil
 		}
+		break
+	}
server/pkg/task/task.go (1)

60-64: Consider using structured Changes type

The Changes field in CopyPayload is a string, but the Changes type is defined as a structured map. Consider using the structured type directly.

 type CopyPayload struct {
 	Collection string
 	Filter     string
-	Changes    string
+	Changes    Changes
 }
worker/cmd/copier/main.go (1)

67-70: Avoid hardcoding database name

The database name "reearth_cms" is hardcoded. Consider making it configurable through environment variables.

-	db := client.Database("reearth_cms")
+	dbName := os.Getenv("REEARTH_CMS_DB_NAME")
+	if dbName == "" {
+		dbName = "reearth_cms" // fallback to default
+	}
+	db := client.Database(dbName)
worker/internal/infrastructure/mongo/webhook.go (1)

30-34: Consider adding context timeout for index initialization

While the implementation is correct, it would be better to use a context with timeout for the index initialization to prevent long-running operations.

 func (r *Webhook) Init() error {
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
-	return r.InitIndex(
-		context.Background(),
-	)
+	return r.InitIndex(ctx)
 }
server/internal/usecase/interfaces/model.go (1)

23-26: Add validation tags to CopyModelParam struct

Consider adding validation tags to ensure ModelId is not empty and Name follows any required format/length constraints.

 type CopyModelParam struct {
-	ModelId   id.ModelID
-	Name        *string
+	ModelId   id.ModelID `validate:"required"`
+	Name     *string    `validate:"omitempty,min=1,max=100"`
 }
server/pkg/schema/schema.go (1)

176-182: Add error return for validation failures

Consider modifying the method to return an error when validation fails.

-func (s *Schema) CopyFrom(s2 *Schema) {
+func (s *Schema) CopyFrom(s2 *Schema) error {
 	if s == nil || s2 == nil {
-		return
+		return errors.New("source or destination schema is nil")
 	}
 	s.fields = slices.Clone(s2.fields)
 	s.titleField = s2.TitleField().CloneRef()
+	return nil
 }
.github/workflows/build_copier.yml (2)

33-41: Improve shell script syntax and readability

Use -n instead of ! -z for better shell script readability and maintainability.

-            if [[ ! -z "$TAG" ]]; then
+            if [[ -n "$TAG" ]]; then

93-95: Optimize multiple redirects to GITHUB_OUTPUT

Consider using a grouped redirect for better efficiency.

-          echo "platforms=$PLATFORMS" >> "$GITHUB_OUTPUT"
-          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
-          echo "tags=$TAGS" >> "$GITHUB_OUTPUT"
+          {
+            echo "platforms=$PLATFORMS"
+            echo "version=$VERSION"
+            echo "tags=$TAGS"
+          } >> "$GITHUB_OUTPUT"
server/internal/infrastructure/gcp/taskrunner.go (1)

161-162: Extract duplicated timeout values

The timeout values are duplicated across the file. Consider extracting them to constants.

+const (
+    buildTimeout = "86400s" // 1 day
+    buildQueueTtl = "86400s" // 1 day
+)

 build := &cloudbuild.Build{
-    Timeout:  "86400s", // 1 day
-    QueueTtl: "86400s", // 1 day
+    Timeout:  buildTimeout,
+    QueueTtl: buildQueueTtl,
server/internal/adapter/integration/model.go (1)

115-128: Consider enhancing error handling with more specific error types.

The error handling could be more granular to provide better feedback to API consumers. Consider handling specific error cases like validation errors, permission errors, etc., separately.

 func (s *Server) CopyModel(ctx context.Context, request CopyModelRequestObject) (CopyModelResponseObject, error) {
 	uc := adapter.Usecases(ctx)
 	op := adapter.Operator(ctx)

 	m, err := uc.Model.Copy(ctx, interfaces.CopyModelParam{
 		ModelId: request.ModelId,
 		Name:    request.Body.Name,
 	}, op)
 	if err != nil {
-		if errors.Is(err, rerror.ErrNotFound) {
-			return CopyModel404Response{}, err
-		}
-		return CopyModel500Response{}, err
+		switch {
+		case errors.Is(err, rerror.ErrNotFound):
+			return CopyModel404Response{}, err
+		case errors.Is(err, rerror.ErrInvalidParams):
+			return CopyModel400Response{}, err
+		case errors.Is(err, rerror.ErrPermissionDenied):
+			return CopyModel403Response{}, err
+		default:
+			return CopyModel500Response{}, err
+		}
 	}
server/internal/usecase/interactor/model_test.go (3)

576-629: Add more test cases for comprehensive coverage.

The test suite should include additional cases:

  • Invalid model name (empty or too long)
  • Permission denied scenarios
  • Concurrent copy operations
  • Copy with metadata validation
 	tests := []struct {
 		name       string
 		param      interfaces.CopyModelParam
 		setupMock  func()
 		wantErr    bool
 		validate   func(t *testing.T, got *model.Model)
 	}{
+		{
+			name: "empty name",
+			param: interfaces.CopyModelParam{
+				ModelId: m.ID(),
+				Name:    lo.ToPtr(""),
+			},
+			setupMock: func() {
+				mRunner.EXPECT().Run(ctx, gomock.Any()).Times(0)
+			},
+			wantErr: true,
+			validate: func(t *testing.T, got *model.Model) {
+				assert.Nil(t, got)
+			},
+		},
+		{
+			name: "permission denied",
+			param: interfaces.CopyModelParam{
+				ModelId: m.ID(),
+				Name:    lo.ToPtr("Copied Model"),
+			},
+			setupMock: func() {
+				mRunner.EXPECT().Run(ctx, gomock.Any()).Times(1).Return(rerror.ErrPermissionDenied)
+			},
+			wantErr: true,
+			validate: func(t *testing.T, got *model.Model) {
+				assert.Nil(t, got)
+			},
+		},

546-555: Consider using test helper functions for setup.

The test setup code could be refactored into helper functions to improve readability and reusability.

+func setupTestSchema(t *testing.T, wid accountdomain.WorkspaceID, pid id.ProjectID) (*schema.Schema, *schema.Schema) {
+	fId1 := id.NewFieldID()
+	sfKey1 := id.RandomKey()
+	sf1 := schema.NewField(schema.NewBool().TypeProperty()).ID(fId1).Key(sfKey1).MustBuild()
+	s1 := schema.New().NewID().Workspace(wid).Project(pid).Fields([]*schema.Field{sf1}).MustBuild()
+
+	fId2 := id.NewFieldID()
+	sfKey2 := id.RandomKey()
+	sf2 := schema.NewField(schema.NewBool().TypeProperty()).ID(fId2).Key(sfKey2).MustBuild()
+	s2 := schema.New().NewID().Workspace(wid).Project(pid).Fields([]*schema.Field{sf2}).MustBuild()
+
+	return s1, s2
+}

567-574: Add error handling for database setup.

The test setup should handle database errors more gracefully.

-	err := db.Project.Save(ctx, p.Clone())
-	assert.NoError(t, err)
-	err = db.Model.Save(ctx, m.Clone())
-	assert.NoError(t, err)
-	err = db.Schema.Save(ctx, s1.Clone())
-	assert.NoError(t, err)
-	err = db.Schema.Save(ctx, s2.Clone())
-	assert.NoError(t, err)
+	for _, err := range []error{
+		db.Project.Save(ctx, p.Clone()),
+		db.Model.Save(ctx, m.Clone()),
+		db.Schema.Save(ctx, s1.Clone()),
+		db.Schema.Save(ctx, s2.Clone()),
+	} {
+		if err != nil {
+			t.Fatalf("failed to setup test database: %v", err)
+		}
+	}
.github/workflows/build_worker.yml (1)

144-167: Optimize script by combining redirects.

The script logic is correct, but we can improve it by combining the redirects to $GITHUB_OUTPUT.

Here's the optimized version:

        run: |
          if [[ -n $TAG ]]; then
            PLATFORMS=linux/amd64,linux/arm64
            VERSION=$TAG
            TAGS=$IMAGE_NAME:$TAG
            if [[ ! $TAG =~ '-' ]]; then
              TAGS+=,${IMAGE_NAME}:${TAG%.*}
              TAGS+=,${IMAGE_NAME}:${TAG%%.*}
              TAGS+=,${IMAGE_NAME}:latest
            fi
          else
            PLATFORMS=linux/amd64
            VERSION=$SHA
            TAGS=$IMAGE_NAME:$NAME
          fi
-          echo "platforms=$PLATFORMS" >> "$GITHUB_OUTPUT"
-          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
-          echo "tags=$TAGS" >> "$GITHUB_OUTPUT"
+          {
+            echo "platforms=$PLATFORMS"
+            echo "version=$VERSION"
+            echo "tags=$TAGS"
+          } >> "$GITHUB_OUTPUT"
🧰 Tools
🪛 actionlint (1.7.4)

150-150: shellcheck reported issue in this script: SC2129:style:15:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dbb8dc and 2fac5ce.

📒 Files selected for processing (34)
  • .github/workflows/build_copier.yml (1 hunks)
  • .github/workflows/build_worker.yml (1 hunks)
  • server/e2e/integration_model_test.go (1 hunks)
  • server/internal/adapter/integration/model.go (1 hunks)
  • server/internal/adapter/integration/server.gen.go (7 hunks)
  • server/internal/infrastructure/gcp/config.go (1 hunks)
  • server/internal/infrastructure/gcp/taskrunner.go (4 hunks)
  • server/internal/infrastructure/gcp/taskrunner_test.go (2 hunks)
  • server/internal/usecase/interactor/model.go (3 hunks)
  • server/internal/usecase/interactor/model_test.go (2 hunks)
  • server/internal/usecase/interfaces/model.go (2 hunks)
  • server/pkg/integrationapi/types.gen.go (3 hunks)
  • server/pkg/schema/schema.go (1 hunks)
  • server/pkg/schema/schema_test.go (1 hunks)
  • server/pkg/task/task.go (2 hunks)
  • server/schemas/integration.yml (2 hunks)
  • worker/Dockerfile (0 hunks)
  • worker/cmd/copier/main.go (1 hunks)
  • worker/copier.Dockerfile (1 hunks)
  • worker/internal/adapter/http/copy.go (1 hunks)
  • worker/internal/adapter/http/main.go (1 hunks)
  • worker/internal/app/main.go (4 hunks)
  • worker/internal/app/repo.go (2 hunks)
  • worker/internal/infrastructure/mongo/common_test.go (1 hunks)
  • worker/internal/infrastructure/mongo/container.go (1 hunks)
  • worker/internal/infrastructure/mongo/copier.go (1 hunks)
  • worker/internal/infrastructure/mongo/copier_test.go (1 hunks)
  • worker/internal/infrastructure/mongo/webhook.go (1 hunks)
  • worker/internal/infrastructure/mongo/webhook_test.go (0 hunks)
  • worker/internal/usecase/interactor/copier.go (1 hunks)
  • worker/internal/usecase/interactor/usecase.go (1 hunks)
  • worker/internal/usecase/interactor/webhook.go (2 hunks)
  • worker/internal/usecase/repo/container.go (1 hunks)
  • worker/internal/usecase/repo/copier.go (1 hunks)
💤 Files with no reviewable changes (2)
  • worker/internal/infrastructure/mongo/webhook_test.go
  • worker/Dockerfile
✅ Files skipped from review due to trivial changes (3)
  • worker/internal/infrastructure/mongo/common_test.go
  • worker/internal/usecase/repo/container.go
  • worker/copier.Dockerfile
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/build_copier.yml

31-31: shellcheck reported issue in this script: SC2236:style:4:9: Use -n instead of ! -z

(shellcheck)


78-78: shellcheck reported issue in this script: SC2129:style:15:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

.github/workflows/build_worker.yml

150-150: shellcheck reported issue in this script: SC2129:style:15:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

🔇 Additional comments (34)
worker/internal/app/main.go (3)

14-14: Adopt consistent import ordering if required by linter.
No functional issues, but check if there's a project-wide rule about import grouping or ordering.


55-55: Field addition is valid.
This newly introduced field, Repos *repo.Container, seamlessly integrates with the earlier changes. No issues found.


33-33: Validate concurrency or dependency injection patterns.
Here, we pass both gateways and repos to initialize the usecases. Ensure that no concurrency or data-race issues arise if these containers are used in parallel.

worker/internal/app/repo.go (5)

9-9: Import alias usage seems appropriate.
Nice use of an alias (rmongo) to differentiate from the standard mongo package.


11-15: Imports provide needed references.
All newly introduced dependencies (repo, mongo, otelmongo, etc.) align well with the new repository approach.


71-71: Return signature updated.
Returning both gateways and repos aligns with the updated use case. Implementation details are consistent with the function’s name.


5-5: Check for potential time package usage in other files.
If the project already has a custom time wrapper, consider whether to use it for consistency.

✅ Verification successful

Based on the search results, I can see that the time package is used extensively throughout the codebase for various purposes like:

  1. Timestamps in tests
  2. Setting update times for models, views, projects, etc.
  3. Handling datetime values in items and fields
  4. Managing expiration times for uploads
  5. Authentication timing

There is no evidence of a custom time wrapper being used - all usages directly reference the standard time package. The codebase consistently uses the standard Go time package for all time-related operations.

Standard time package usage is appropriate here
The standard time package is being used consistently across the codebase, and there's no custom time wrapper to maintain consistency with.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for any custom time wrapper usage
rg -A 3 "time.Now|customTime"

Length of output: 52794


53-70: Validate MongoDB connection error handling.
The code handles connection failure by logging a fatal error, which is acceptable for a worker service. Check if graceful retries or backoff are desired if DB connectivity is crucial.

✅ Verification successful

Let me search for any retry mechanisms or connection handling patterns in the codebase.


MongoDB connection handling is appropriate for the worker service.

The current implementation using log.Fatalf for MongoDB connection errors is suitable for a worker service because:

  1. The code sets a reasonable connection timeout of 10 seconds
  2. The worker service requires a functioning database to operate, and failing fast is appropriate
  3. The service is likely managed by container orchestration (as seen in docker-compose files) which will handle restarts
  4. The codebase shows a pattern of using fatal errors for critical infrastructure initialization failures

While retry mechanisms exist in other parts of the codebase (e.g., task runners, locks), they are used for operational tasks rather than infrastructure initialization. The current approach aligns with the application's architecture and deployment strategy.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for repeated usage or definition of DB connection logic that might conflict with this approach
rg -A 10 "mongo.Connect"

Length of output: 3274


Script:

#!/bin/bash
# Search for retry-related code or connection handling patterns
rg -i "retry|backoff|reconnect" -A 5

# Also check for any MongoDB connection configuration in config files
fd -e yaml -e yml -e json -e toml | xargs rg -i "mongo|database"

Length of output: 17573

worker/internal/infrastructure/mongo/copier.go (3)

39-42: Fallback error handling for uninitialized collection.
Returning rerror.ErrInternal is a good practice. Ensure higher-level calls handle or log the error.


133-144: ID generation strategy.
generateId function only supports “item”, “schema”, “model”. For unrecognized types, it returns an empty string. If the calling function is not prepared for that scenario, it may be an issue.

✅ Verification successful

Ignoring the boolean return value is safe in this context

The code is actually safe. The generateId function is only called in the ChangeTypeNew case where change.Value is expected to be one of the supported types. The boolean return value is intentionally ignored (newId, _ := generateId(change.Value)) as the empty string returned for unsupported types is an acceptable fallback value for the result[k] = newId assignment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of generateId results to see unhandled false returns
rg -A 3 "generateId("

Length of output: 96


Script:

#!/bin/bash
# Search for usage of generateId function with proper escaping
rg -A 3 "generateId\("

Length of output: 632


Script:

#!/bin/bash
# Get more context around the usage of generateId to understand the error handling
rg -B 5 -A 10 "generateId\("

Length of output: 2162


75-84: Verify filter parsing.
JSON unmarshal from a string is convenient but can raise runtime errors if the input is not verified. Confirm that upstream logic sanitizes or validates the filter string.

server/internal/usecase/interactor/model.go (6)

16-16: Task package usage for copying.
The introduction of “task” references is consistent with the worker approach. No immediate issues.


355-375: copyModel logic.
We see the creation of a new model with a random key. This is correct to avoid key collisions. Confirm uniqueness constraints are adequately tested.


415-438: Building filter and changes for the item copy event.
The JSON marshalling ensures it’s compatible with the worker’s copying logic. It’s good that “id” is re-generated. Confirm that no references to the old item ID remain.


22-22: Import of go.mongodb.org/mongo-driver/bson
Used for building filter objects. Ensure no direct DB logic is creeping into the business logic layer.

✅ Verification successful

Let me search for any direct MongoDB usage in the business logic layer.


MongoDB usage is limited to filter serialization only
The bson package is only used to serialize a filter object to JSON before passing it to a task. This is not direct database logic but rather a data structure conversion, which is acceptable in the business logic layer.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for direct calls to Mongo driver in the model interactor beyond tasks
rg -A 5 "mongo\."

Length of output: 16017


Script:

#!/bin/bash
# Look for bson usage in the usecase/interactor directory
rg -A 5 "bson\." server/internal/usecase/interactor/

Length of output: 474


321-353: Full copy operation entry point.
The transaction-based approach is good, as it ensures partial results are rolled back. Thoroughly test large copies or concurrency.

✅ Verification successful

Model copy operation is properly implemented with transaction safety

The code analysis reveals that the Copy operation is well-implemented with proper transaction handling and no concurrent access issues:

  • The operation is wrapped in a transaction using Run1 with proper rollback on failure
  • No concurrent goroutines or mutex locks are used/needed as the operation is already transaction-protected
  • Copy operations are properly isolated through the repository layer
  • The implementation includes proper error handling at each step (model, schema, items, metadata)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Identify concurrency usage around 'Copy' in the codebase
rg -A 7 "i\.Copy\(.*context\.Context"

Length of output: 39


Script:

#!/bin/bash
# Let's try a broader search for Copy method usage and concurrent operations
rg -i "Copy.*context" -A 5

# Also search for any goroutine or concurrent operations near model operations
rg "go\s+func.*model" -A 5

# Check for mutex usage in model-related files
fd "model.*\.go" --exec rg "sync\.Mutex|Lock\(\)|Unlock\(\)" {} \;

Length of output: 8609


377-388: Schema duplication.
We see newSchema.CopyFrom(oldSchema). Confirm that relations, fields, or constraints are properly replicated if they exist.

✅ Verification successful

Let me check the actual implementation of CopyFrom to verify what exactly is being copied.


Let me check the Field type to understand what's being cloned along with the schema.


Schema duplication is properly implemented

The implementation of CopyFrom in server/pkg/schema/schema.go shows that it correctly handles the duplication of schema components:

  • Fields are deep copied using slices.Clone
  • Title field reference is cloned using CloneRef()
  • The Field struct contains all necessary schema components including id, name, description, key, constraints (unique, multiple, required), type properties, and default values
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for calls to 'CopyFrom' to see if there are additional constraints/fields
rg -A 3 "CopyFrom"

Length of output: 1440


Script:

#!/bin/bash
# Let's look at the Schema struct and CopyFrom implementation
ast-grep --pattern 'type Schema struct {
  $$$
}'

# Also get the full implementation of CopyFrom
ast-grep --pattern 'func (s *Schema) CopyFrom($_) {
  $$$
}'

Length of output: 3522


Script:

#!/bin/bash
# Get the Field type definition
ast-grep --pattern 'type Field struct {
  $$$
}'

# Also check if Field has any Clone method
ast-grep --pattern 'func (f *Field) Clone() {
  $$$
}'

Length of output: 1884

server/e2e/integration_model_test.go (1)

87-106: Validate additional properties after copy.
While the “id”, “projectId”, “schemaId”, etc. are verified, consider confirming that fields like “key” or “public” are handled properly (especially if the copy logic changes these fields).

worker/internal/usecase/interactor/usecase.go (1)

13-17: Validate container dependencies.
Before building the Usecase, ensure that “r” is not nil and that required repository fields within the container are initialized. Consider adding a quick nil check to avoid runtime panics.

worker/internal/adapter/http/main.go (1)

8-8: LGTM!

The addition of CopyController follows the existing pattern and is properly initialized in the constructor.

Also applies to: 12-15

server/internal/infrastructure/gcp/taskrunner_test.go (2)

17-18: LGTM: Clear variable naming for image configurations

The separation of decompressorImage and copierImage variables improves clarity and aligns with the single responsibility principle.


36-39: Verify copier configuration integration

The configuration looks good, but we should ensure the copier topic is properly configured for the copy operations.

Consider adding a CopierTopic configuration similar to DecompressorTopic for consistency and future extensibility.

server/pkg/task/task.go (1)

67-70: LGTM: Well-defined change types

The ChangeType constants are clearly defined and provide good type safety for copy operations.

server/internal/usecase/interfaces/model.go (1)

64-64: LGTM! Interface method signature is well-defined

The Copy method signature properly includes context and operator parameters, maintaining consistency with other interface methods.

server/pkg/schema/schema.go (1)

176-182: 🛠️ Refactor suggestion

Verify deep copy of field references

While the implementation handles basic copying, there might be issues with field references not being properly deep copied. Consider the following improvements:

  1. Add validation to ensure field types are compatible
  2. Implement proper deep copy for field references
 func (s *Schema) CopyFrom(s2 *Schema) {
 	if s == nil || s2 == nil {
 		return
 	}
-	s.fields = slices.Clone(s2.fields)
+	s.fields = make([]*Field, len(s2.fields))
+	for i, f := range s2.fields {
+		s.fields[i] = f.Clone() // Assuming Field has a Clone method
+	}
 	s.titleField = s2.TitleField().CloneRef()
 }
✅ Verification successful

Based on the gathered evidence, I can now provide the final response:

Field references are properly deep copied

The implementation correctly handles deep copying of field references:

  1. Field.Clone() properly clones all fields including typeProperty and defaultValue
  2. TypeProperty.Clone() performs a deep copy of all its fields, including the reference field
  3. Each field type (text, number, reference, etc.) has its own Clone method

The current implementation in CopyFrom using slices.Clone is sufficient since Field objects are properly cloned through their Clone() method.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for field reference usage in schemas
ast-grep --pattern 'type Field struct {
  $$$
  TypeProperty() $_ {
    $$$
  }
  $$$
}'

Length of output: 90


Script:

#!/bin/bash
# Let's check for Field struct definition and its methods
rg "type Field struct" -A 10

# Check for Clone or similar methods on Field
rg "func.*Field.*Clone" -A 5

# Check for field-related methods in Schema
rg "func.*Schema.*field" -i -A 5

Length of output: 84186


Script:

#!/bin/bash
# Let's check the Field.Clone() implementation
rg "func \(f \*Field\) Clone\(\)" -A 10 server/pkg/schema/field.go

Length of output: 274


Script:

#!/bin/bash
# Let's see the complete Field.Clone() implementation and check if it properly clones all fields
rg "func \(f \*Field\) Clone\(\)" -B 5 -A 20 server/pkg/schema/field.go

# Also check if TypeProperty has a Clone method
rg "func.*TypeProperty.*Clone" -A 5 server/pkg/schema/

Length of output: 940


Script:

#!/bin/bash
# Let's check the TypeProperty struct and its Clone implementation
rg "type TypeProperty struct" -A 10 server/pkg/schema/type_property.go

# And see the complete Clone implementation
rg "func \(t \*TypeProperty\) Clone\(\)" -B 5 -A 20 server/pkg/schema/type_property.go

Length of output: 1159

server/pkg/integrationapi/types.gen.go (2)

708-711: LGTM! Simple and focused request body for model copy operation.

The request body only contains an optional name field, keeping the API minimal and focused.


542-542: Verify API compatibility for TitleField JSON tag change.

The JSON tag for TitleField has been changed from PascalCase to camelCase. While this aligns with JSON naming conventions, it's a breaking change that could affect API consumers.

Run the following script to check for existing usages:

server/schemas/integration.yml (1)

452-484: LGTM! Well-defined endpoint with comprehensive error handling.

The endpoint definition is complete with:

  • Clear path parameters
  • Bearer token security
  • Proper error responses (400, 401, 404, 500)
  • Reuse of existing model schema for response
server/internal/adapter/integration/server.gen.go (3)

78-80: LGTM! Clear method documentation and consistent interface definition.

The CopyModel method is well-documented and follows the same pattern as other API methods.


513-529: LGTM! Robust parameter handling and error checking.

The handler implementation includes:

  • Proper parameter binding
  • Standard error handling
  • Security scope setting

2278-2324: LGTM! Complete response type implementations.

The response types are well-defined with:

  • Proper content type headers
  • Error response implementations
  • Consistent with OpenAPI spec
.github/workflows/build_worker.yml (3)

127-131: LGTM! Job definition follows established patterns.

The new docker_copier job is well-structured and consistent with existing jobs in the workflow.


132-143: LGTM! Setup steps are properly configured.

The setup steps use the latest stable versions of actions and follow the same secure pattern for Docker authentication.


168-178: Verify the existence of copier.Dockerfile.

The build configuration looks correct, but we should verify that ./worker/copier.Dockerfile exists and contains the necessary configuration for the copier service.

✅ Verification successful

Dockerfile exists and follows best practices

The ./worker/copier.Dockerfile exists and follows recommended Docker practices:

  • Uses multi-stage build to minimize image size
  • Copies only the necessary binary from builder
  • Uses scratch as base image for runtime
  • Sets proper ENTRYPOINT
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the Dockerfile exists and show its content
if [ -f "./worker/copier.Dockerfile" ]; then
    echo "Dockerfile exists. Content:"
    cat "./worker/copier.Dockerfile"
else
    echo "Error: ./worker/copier.Dockerfile not found"
    exit 1
fi

Length of output: 302

worker/internal/adapter/http/copy.go Show resolved Hide resolved
worker/internal/adapter/http/copy.go Show resolved Hide resolved
server/internal/infrastructure/gcp/taskrunner.go Outdated Show resolved Hide resolved
server/internal/adapter/integration/model.go Show resolved Hide resolved
server/pkg/schema/schema_test.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
server/internal/infrastructure/gcp/taskrunner.go (2)

167-174: 🛠️ Refactor suggestion

Extract hardcoded build arguments to configuration

Similar to the decompressAsset function, these build arguments should be configurable.

+type CopierConfig struct {
+    ThreadCount    string
+    GCLimit       string
+    ChunkSize     string
+    DiskLimit     string
+    SkipTop       bool
+    OldWindows    bool
+}

 type TaskConfig struct {
     // ... existing fields ...
+    CopierConfig *CopierConfig
 }

-Args: []string{
-    "-v",
-    "-n=192",
-    "-gc=5000",
-    "-chunk=1m",
-    "-disk-limit=20g",
-    "-skip-top",
-    "-old-windows",
-},
+args := []string{"-v"}
+if conf.CopierConfig != nil {
+    if conf.CopierConfig.ThreadCount != "" {
+        args = append(args, "-n="+conf.CopierConfig.ThreadCount)
+    }
+    // ... add other configurable arguments
+}

176-178: ⚠️ Potential issue

Add input validation for environment variables

The Copy payload fields are used directly in environment variables without validation. This could lead to injection vulnerabilities.

+func sanitizeEnvValue(value string) string {
+    // Remove any characters that could cause issues in environment variables
+    // This is a basic example - implement proper sanitization based on your requirements
+    return strings.Map(func(r rune) rune {
+        if unicode.IsLetter(r) || unicode.IsNumber(r) || r == '-' || r == '_' || r == '.' {
+            return r
+        }
+        return -1
+    }, value)
+}

 Env: []string{
-    "REEARTH_CMS_COPIER_COLLECTION=" + p.Copy.Collection,
-    "REEARTH_CMS_COPIER_FILTER=" + p.Copy.Filter,
-    "REEARTH_CMS_COPIER_CHANGES=" + p.Copy.Changes,
+    "REEARTH_CMS_COPIER_COLLECTION=" + sanitizeEnvValue(p.Copy.Collection),
+    "REEARTH_CMS_COPIER_FILTER=" + sanitizeEnvValue(p.Copy.Filter),
+    "REEARTH_CMS_COPIER_CHANGES=" + sanitizeEnvValue(p.Copy.Changes),
 },
🧹 Nitpick comments (5)
server/internal/infrastructure/gcp/taskrunner.go (1)

182-184: Consider making disk size configurable for copy operations

The copy operation uses a hardcoded defaultDiskSizeGb value. Consider making this configurable similar to the decompressor.

 Options: &cloudbuild.BuildOptions{
-    DiskSizeGb: defaultDiskSizeGb,
+    DiskSizeGb: conf.CopierDiskSizeGb,
 },
server/pkg/task/task.go (2)

64-68: Add documentation for Changes and Change types

The types lack documentation explaining their purpose and usage.

Add documentation:

+// Changes represents a map of field changes to be applied during copy operations
 type Changes map[string]Change
+
+// Change represents a single field modification with its type and new value
 type Change struct {
 	Type  ChangeType
 	Value string
 }

71-74: Consider adding validation for ChangeType values

The ChangeType constants should be validated when used.

Add validation:

 const (
 	ChangeTypeSet ChangeType = "set"
 	ChangeTypeNew ChangeType = "new"
 )
+
+// IsValid checks if the ChangeType is a valid value
+func (ct ChangeType) IsValid() bool {
+	switch ct {
+	case ChangeTypeSet, ChangeTypeNew:
+		return true
+	default:
+		return false
+	}
+}
.github/workflows/build_copier.yml (1)

7-10: Consider adding timeout to concurrency configuration

The concurrency configuration lacks a timeout, which could lead to stuck workflows.

Add timeout:

 concurrency:
   group: ${{ github.workflow }}-${{ github.event.workflow_run.head_branch }}
   cancel-in-progress: true
+  timeout-minutes: 60
server/internal/usecase/interactor/model.go (1)

415-439: Add retry mechanism for copyItems

The copyItems method should handle transient failures during the copy operation.

Add retry logic:

 func (i Model) copyItems(ctx context.Context, oldSchemaID, newSchemaID id.SchemaID, newModelID id.ModelID) error {
+	maxRetries := 3
+	var lastErr error
+	
+	for attempt := 1; attempt <= maxRetries; attempt++ {
 		collection := "item"
 		filter, err := json.Marshal(bson.M{"schema": oldSchemaID.String()})
 		if err != nil {
 			return err
 		}
 		changes, err := json.Marshal(task.Changes{
 			"id": {
 				Type:  task.ChangeTypeNew,
 				Value: "item",
 			},
 			"schema": {
 				Type:  task.ChangeTypeSet,
 				Value: newSchemaID.String(),
 			},
 			"model": {
 				Type:  task.ChangeTypeSet,
 				Value: newModelID.String(),
 			},
 		})
 		if err != nil {
 			return err
 		}
-		return i.triggerCopyEvent(ctx, collection, string(filter), string(changes))
+		if err := i.triggerCopyEvent(ctx, collection, string(filter), string(changes)); err != nil {
+			lastErr = err
+			log.Warnf("attempt %d failed: %v", attempt, err)
+			continue
+		}
+		return nil
+	}
+	return fmt.Errorf("failed after %d attempts: %v", maxRetries, lastErr)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2f3636 and e80cf07.

📒 Files selected for processing (4)
  • .github/workflows/build_copier.yml (1 hunks)
  • server/internal/infrastructure/gcp/taskrunner.go (4 hunks)
  • server/internal/usecase/interactor/model.go (3 hunks)
  • server/pkg/task/task.go (2 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/build_copier.yml

31-31: shellcheck reported issue in this script: SC2236:style:4:9: Use -n instead of ! -z

(shellcheck)


78-78: shellcheck reported issue in this script: SC2129:style:15:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

🔇 Additional comments (3)
server/internal/infrastructure/gcp/taskrunner.go (1)

119-120: 🛠️ Refactor suggestion

Extract hardcoded build step arguments to configuration

The build step arguments contain multiple hardcoded values that should be configurable. This makes the application less flexible and harder to maintain.

+type DecompressorConfig struct {
+    ThreadCount    string
+    GCLimit       string
+    ChunkSize     string
+    DiskLimit     string
+    SkipTop       bool
+    OldWindows    bool
+}

 type TaskConfig struct {
     // ... existing fields ...
+    DecompressorConfig *DecompressorConfig
 }

-Args: []string{"-v", "-n=192", "-gc=5000", "-chunk=1m", "-disk-limit=20g", "-gzip-ext=" + conf.DecompressorGzipExt, "-skip-top", "-old-windows", src, dest},
+args := []string{"-v"}
+if conf.DecompressorConfig != nil {
+    if conf.DecompressorConfig.ThreadCount != "" {
+        args = append(args, "-n="+conf.DecompressorConfig.ThreadCount)
+    }
+    // ... add other configurable arguments
+}
+args = append(args, "-gzip-ext="+conf.DecompressorGzipExt, src, dest)

Likely invalid or redundant comment.

server/pkg/task/task.go (1)

48-52: 🛠️ Refactor suggestion

Consider enhancing CopyPayload validation

The CopyPayload struct contains critical fields for copy operations but lacks comprehensive validation.

Consider adding field-level validation:

 type CopyPayload struct {
 	Collection string
 	Filter     string
 	Changes    string
+
+	// Validate checks if the payload fields are valid
+	func (p *CopyPayload) Validate() error {
+		if p == nil {
+			return errors.New("payload cannot be nil")
+		}
+		if p.Collection == "" {
+			return errors.New("collection is required")
+		}
+		if p.Filter == "" {
+			return errors.New("filter is required")
+		}
+		if !json.Valid([]byte(p.Changes)) {
+			return errors.New("changes must be valid JSON")
+		}
+		return nil
+	}
 }

Likely invalid or redundant comment.

.github/workflows/build_copier.yml (1)

97-106: 🛠️ Refactor suggestion

Add resource limits to Docker build step

The Docker build step lacks resource constraints, which could impact CI/CD performance.

Add resource limits:

       uses: docker/build-push-action@v6
       with:
         context: ./worker
         file: ./worker/copier.Dockerfile
         platforms: ${{ steps.options.outputs.platforms }}
         push: true
         build-args: VERSION=${{ steps.options.outputs.version }}
         tags: ${{ steps.options.outputs.tags }}
         cache-from: type=gha
         cache-to: type=gha,mode=max
+        timeout-minutes: 60
+        memory: 6G
+        cpu-limit: 2

Likely invalid or redundant comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant