Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collectionName/databaseName attributes to MongoDB provider #3322

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jesmith17
Copy link

Added support for defining the target collectionName and databaseName for the logs to the config file. This allows users to define the values via config instead of only configuring them via the connectionString.

@ppkarwasz ppkarwasz changed the title Main Add collectionName/databaseName attributes to MongoDB provider Dec 25, 2024
@ppkarwasz
Copy link
Contributor

@jesmith17,

Thank you for your contribution.

While I am not a MongoDB user, the current behavior of using the connection string to infer the name of the database and collection does not make sense to me. Your proposal looks like a valid improvement. According to the MongoDB connection string documentation the connection string has the form:

mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

where defaultauthdb is documented as:

The authentication database to use if the connection string includes username:password@ authentication credentials but the authSource option is unspecified.

Therefore:

  • defaultauthdb does not seem to be the right value to be used as database name to store the logs: I imagine that some MongoDB installation might use a different database for authentication and to store the data, right?
  • the collection name does not even appear in the official connection string. It only appears in the Java-specific ConnectionString class. I guess that many users don't know that syntax.

@garydgregory, what do you think?

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks like a very valid improvement! 💯

The changes to the MongoDB provider should be documented. Can you:

  • Modify database.adoc and add a description of the databaseName and collectionName configuration attributes. IMHO these attributes should be marked as "required" in the documentation and the description should mention their fallback values (which are kept only for backward compatibility). The fallback values can be marked as deprecated and we might remove them in a future Log4j Core version.
  • Since IMHO providing the database and collection name explicitly seems like the right thing to do, all the examples and integration tests should be rewritten to use the databaseName and collectionName attributes.

@ppkarwasz
Copy link
Contributor

@jesmith17,

The integration tests in the log4j-mongodb module use Docker. If you want to run the tests locally, you need to have Docker available on your system and run:

./mvnw verify -pl log4j-mongodb -Pdocker

@garydgregory
Copy link
Member

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

IMO, we should follow the KISS principle and only provide connection string support.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

@ppkarwasz
Copy link
Contributor

@garydgregory,

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

These were my initial thoughts too, until I dug deeper into the connection string.
From what I have understood from the connection string (mainly this documentation), it does not provide neither the database nor the collection name:

  • it does not contain the database name, because the defaultauthdb part is the name of the database, where the user is defined, not the database that will be used by the MongoDB client. The MongoDB client can use any database available on the server. Did I misinterpret this part?
  • the official connections string does not even have a "collectionName" part. The collection name is only defined by the Java-specific ConnectionString. So Java developers might know this alternative syntax, but MongoDB administrators might not know it.

As I mentioned before, I have almost no experience with MongoDB, so these interpretations might not make sense.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

For JDBC you also need to provide the name of the table.

@garydgregory
Copy link
Member

So the real problem is that the Mongo documentation could be better, like in any software.

I am concerned this opens the door for more and more options to be added to "override" the connection string components, making maintenance harder.

We should only, IMO, offer configuration hooks for behavior that can't be achieved otherwise. Let's review:

The database as part of the connection string is documented here (where Google took me):
https://www.mongodb.com/docs/manual/reference/connection-string-examples/#std-label-connections-connection-examples
For example. 2 examples I picked from the page:
Self-Hosted records Database Running Locally:

mongodb://myDatabaseUser:D1fficultP%40ssw0rd@localhost/records

and fancier:
MongoDB Atlas Cluster that Authenticates with AWS IAM credentials:

mongodb+srv://<aws access key id>:<aws secret access key>@cluster0.example.com/testdb?authSource=$external&authMechanism=MONGODB-AWS

The collection is specified as "databaseName.collectionName", see our XML configuration files.

The classic problem is that we do not want to create support tickets like "My appender connects to A instead of B, why?" where we then have to say "Oh, that's because you specified the foo in both the connection string and in the other thing and this takes precedence over that, blah blah".

For these use cases of double configuration, we should throw an exception.

Who would port this to log4j-mongodb4? to master?

@vy
Copy link
Member

vy commented Dec 25, 2024

Agreeing with @garydgregory's sentiment that we should not add more configuration knobs, if the shared use case can be served with just the connection string. I can spare time for this further after the vacation (i.e., 2025-01-02), if not earlier.

@jesmith17
Copy link
Author

I agree that there is a slippery slope here, but specifying a table name for a db connection in the connection string is very very odd. Allowing the collection name via config instead of connection string puts this in line how the JDBC appender works which required a table name as part of the config. I would actually prefer to make this a config only part and have it override the connection string value. With the way the code currently functions you have to extract the db and collection name from the connection string to pass them to other code blocks anyway. But to preserve backwards compatibility I left it in.

@ppkarwasz
Copy link
Contributor

I looked into the Spring Boot configuration properties for MongoDB:

  • Spring Boot offers a set of properties (username, password, host, port, additional-hosts) that are mutually exclusive with uri. We should certainly not introduce those.
  • The database (and authentication-database) property can however be used **together** with uri`.

It seems that Spring Boot interprets the /defaultauthdb part as "default database to use for authentication and data", while it allows users to specify a different database for each one of these functionalities. We should probably offer a similar configuration option. As documented in Authentication Database, you can create a user in database foo and allow the user to access database bar, so we need at least (and at most 😉) an additional configuration knob for the database to use (or authentication database to use).

@jesmith17
Copy link
Author

I would also point out that spring doesn't have a configuration property for the collection, and that's really the key item here. They assume, appropriately, that you will have various objects being written to various collections inside the app. So you configure the collection for items differently than you do the DB.

That's really the pattern I think is needed here.

@jesmith17
Copy link
Author

So I am working on making the unit test changes but I am finding 2 key challenges.

  1. The MongoDbProvider and the MongoDbConnection classes aren't easily unitTested. Essentially they don't expose anyways to ask them about the database or collection (or even the connection for that matter) based on how they currently sit. So Im finding that I have to make some more changes to make them really testable.
  2. I am getting blocked by the io.fabric5:docker-maven-plugin and it's unhappiness that I am running on a M1 Mac. It can't seem to resolve any of the docker server connections so it fails when trying to start docker. I have verified that it's all running, but seem to be stuck. Has anyone recently hit this issue with the Mac M1 process and docker? According to the docker-maven-plugin this was supposed to be resolved in 0.38 and the version mine is finding is 0.45.1.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks almost done to me, thanks! It is still missing some details:

  • can you modify the config files in log4j-mongodb/src/test/resources to use the databaseName and collectionName attributes?
  • can you add a changelog entry to src/changelog/.2.x.x? Just copy/paste an existing one and modify the data.

@jesmith17
Copy link
Author

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

@ppkarwasz
Copy link
Contributor

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

Sorry, I meant src/changelog/.3.x.x 😉

Copy link
Author

@jesmith17 jesmith17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me

@ppkarwasz ppkarwasz requested review from garydgregory and vy February 12, 2025 08:42
@ppkarwasz
Copy link
Contributor

@garydgregory, @vy,

Could you review this PR?

Copy link

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@vy
Copy link
Member

vy commented Feb 12, 2025 via email

Copy link
Member

@vy vy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17, thanks so much for the changes! I have requested some changes. But I think we have a more fundamental problem: Your PR targets main, i.e., Log4j 3, that is yet to be released. Would you mind editing your PR to target the 2.x branch, please? I will share further remarks once the target is fixed.

@@ -817,6 +817,29 @@ for its format.

**Required**

| [[MongoDbProvider-attr-databaseName]]databaseName
| 'string'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 'string'
| `string`

**Required**

| [[MongoDbProvider-attr-collectionName]]collectionName
| 'string'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 'string'
| `string`

|
It specifies the name of the collection for the appender to use.
For backward compatibility, the collection name can also be specified in the
https://mongodb.github.io/mongo-java-driver/5.0/apidocs/mongodb-driver-core/com/mongodb/ConnectionString.html[Java-specific connection string]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
https://mongodb.github.io/mongo-java-driver/5.0/apidocs/mongodb-driver-core/com/mongodb/ConnectionString.html[Java-specific connection string]
https://mongodb.github.io/mongo-java-driver/5.0/apidocs/mongodb-driver-core/com/mongodb/ConnectionString.html[Java-specific connection string].

Comment on lines +6 to +7
<issue id="3321" link="https://github.com/apache/logging-log4j2/discussions/3321"/>
<description format="asciidoc">Update log4j-mongodb to allow collectionName and databaseName to be defined via the config file</description>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<issue id="3321" link="https://github.com/apache/logging-log4j2/discussions/3321"/>
<description format="asciidoc">Update log4j-mongodb to allow collectionName and databaseName to be defined via the config file</description>
<issue id="3322" link="https://github.com/apache/logging-log4j2/pull/3322"/>
<description format="asciidoc">Add `collectionName` and `databaseName` arguments to the MongoDB appender</description>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you switched all IT configuration to use the new {database,collection}Name attributes, don't. This effectively removes the checks verifying that the old style is still working. Could you revert all your resources/*.xml changes, please? (Keep the new resource files you added for new tests.)

Comment on lines +27 to +32
private String validConnectionStringWithoutDatabase = "mongodb://localhost:27017";
private String validConnectionStringWithDatabase = "mongodb://localhost:27017/logging";
private String validConnectionStringWithDatabaseAndCollection = "mongodb://localhost:27017/logging.logs";

private String collectionName = "logsTest";
private String databaseName = "loggingTest";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you make all these private static final, please? (I think you can also rename validConnectionStringFoo to CON_STR_FOO to save some visual real estate.)

Comment on lines +43 to +51
assertNotNull(provider, "Returned provider is null");
assertEquals(
this.collectionName,
provider.getConnection().getCollection().getNamespace().getCollectionName(),
"Collection names do not match");
assertEquals(
this.databaseName,
provider.getConnection().getCollection().getNamespace().getDatabaseName(),
"Database names do not match");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All checks are pretty self-explanatory. Could you remove all messages (Returned provider is null, Collection names do not match, etc.), please?

@@ -151,7 +150,7 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skip>true</skip>
<skip>false</skip>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the entire <configuration>, since <skip>false is the default.

@jesmith17
Copy link
Author

@vy Because the filestrucutres between main and 2.x are different I don't think there is a way to just retarget the code changes. In the 2.x branch the MongoDbProvider class is just a shell that wraps calls to MongoDb4Provider (which is in a different project). I can create a new PR with these changes (and the feedback above) and do that if that makes sense. but trying to port these changes to the 2.x branch ends up being a much bigger change to the core of the appender itself.

To your point on the changes to the IT tests to use the new format in all cases, I agree that a test needs to exist for using the old format, but I made those changes at the request of @ppkarwasz on Dec 29th. I don't personally have a strong opinion either way, so if you can both get connected and let me know which way you prefer it, I can make the changes.

@jesmith17
Copy link
Author

Actually for the IT tests, I just added an additional one that explicitly tests the old format. That's more descriptive and clear anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants