Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid turning huge filepaths to atoms #18

Merged
merged 5 commits into from
Oct 18, 2024

Conversation

thalesmg
Copy link
Contributor

Specially when running with CT, used filepaths may become larger than the maximum atom
size of 255 bytes:

2024-10-17T14:18:10.117220+00:00 [error] GenServer #PID<0.31263.0> terminating, ** (SystemLimitError) a system limit has been reached, :erlang.binary_to_atom("/emqx/_build/emqx-enterprise-test/logs/ct_run.test@127.0.0.1.2024-10-17_14.18.07/apps.emqx_bridge_kafka.emqx_bridge_v2_kafka_producer_SUITE.t_fixed_topic_recovers_in_disk_mode.logs/run.2024-10-17_14.18.07/log_private/emqx_bridge_v2_kafka_producer_SUITE_data/kafka/kafka:fixed_topic_disk_recover:test@127.0.0.1/action=3akafka_producer=3afixed_topic_disk_recover=3aconnector=3akafka_producer=3ac_test-topic-one-partition/0/committer", :utf8), (replayq 0.3.7) /emqx/deps/replayq/src/replayq.erl:520: :replayq.spawn_committer/2, (replayq 0.3.7) /emqx/deps/replayq/src/replayq.erl:103: :replayq.open/1, (wolff 4.0.2) /emqx/deps/wolff/src/wolff_producer.erl:258: :wolff_producer.do_init/1, (wolff 4.0.2) /emqx/deps/wolff/src/wolff_producer.erl:298: :wolff_producer.handle_info/2, (stdlib 5.2.3.1) gen_server.erl:1095: :gen_server.try_handle_info/3, (stdlib 5.2.3.1) gen_server.erl:1183: :gen_server.handle_msg/6, (stdlib 5.2.3.1) proc_lib.erl:241: :proc_lib.init_p_do_apply/3, Last message: {:do_init, %{config: %{group: "action:kafka_producer:fixed_topic_disk_recover:connector:kafka_producer:c", max_linger_bytes: 10485760, max_batch_bytes: 917504, required_acks: :all_isr, compression: :no_compression, replayq_dir: "/emqx/_build/emqx-enterprise-test/logs/ct_run.test@127.0.0.1.2024-10-17_14.18.07/apps.emqx_bridge_kafka.emqx_bridge_v2_kafka_producer_SUITE.t_fixed_topic_recovers_in_disk_mode.logs/run.2024-10-17_14.18.07/log_private/emqx_bridge_v2_kafka_producer_SUITE_data/kafka/kafka:fixed_topic_disk_recover:test@127.0.0.1", replayq_offload_mode: false, replayq_max_total_bytes: 2147483648, replayq_seg_bytes: 104857600, drop_if_highmem: false, ack_timeout: 10000, partitioner: :random, partition_count_refresh_interval_seconds: 60, max_linger_ms: 0, max_send_ahead: 9, telemetry_meta_data: %{bridge_id: "action:kafka_producer:fixed_topic_disk_recover:connector:kafka_producer:c"}, max_partitions: :all_partitions, reconnect_delay_ms: 2000}, partition: 0, topic: "test-topic-one-partition", client_id: "connector:kafka_producer:c", conn: #PID<0.31253.0>, linger_expire_timer: false}}

@thalesmg thalesmg force-pushed the fix-register-huge-filepath-as-atom branch 3 times, most recently from efbcc00 to 1989efa Compare October 17, 2024 14:41
Specially when running with CT, used filepaths may become larger than the maximum atom
size of 255 bytes:

```
2024-10-17T14:18:10.117220+00:00 [error] GenServer #PID<0.31263.0> terminating, ** (SystemLimitError) a system limit has been reached, :erlang.binary_to_atom("/emqx/_build/emqx-enterprise-test/logs/ct_run.test@127.0.0.1.2024-10-17_14.18.07/apps.emqx_bridge_kafka.emqx_bridge_v2_kafka_producer_SUITE.t_fixed_topic_recovers_in_disk_mode.logs/run.2024-10-17_14.18.07/log_private/emqx_bridge_v2_kafka_producer_SUITE_data/kafka/kafka:fixed_topic_disk_recover:test@127.0.0.1/action=3akafka_producer=3afixed_topic_disk_recover=3aconnector=3akafka_producer=3ac_test-topic-one-partition/0/committer", :utf8), (replayq 0.3.7) /emqx/deps/replayq/src/replayq.erl:520: :replayq.spawn_committer/2, (replayq 0.3.7) /emqx/deps/replayq/src/replayq.erl:103: :replayq.open/1, (wolff 4.0.2) /emqx/deps/wolff/src/wolff_producer.erl:258: :wolff_producer.do_init/1, (wolff 4.0.2) /emqx/deps/wolff/src/wolff_producer.erl:298: :wolff_producer.handle_info/2, (stdlib 5.2.3.1) gen_server.erl:1095: :gen_server.try_handle_info/3, (stdlib 5.2.3.1) gen_server.erl:1183: :gen_server.handle_msg/6, (stdlib 5.2.3.1) proc_lib.erl:241: :proc_lib.init_p_do_apply/3, Last message: {:do_init, %{config: %{group: "action:kafka_producer:fixed_topic_disk_recover:connector:kafka_producer:c", max_linger_bytes: 10485760, max_batch_bytes: 917504, required_acks: :all_isr, compression: :no_compression, replayq_dir: "/emqx/_build/emqx-enterprise-test/logs/ct_run.test@127.0.0.1.2024-10-17_14.18.07/apps.emqx_bridge_kafka.emqx_bridge_v2_kafka_producer_SUITE.t_fixed_topic_recovers_in_disk_mode.logs/run.2024-10-17_14.18.07/log_private/emqx_bridge_v2_kafka_producer_SUITE_data/kafka/kafka:fixed_topic_disk_recover:test@127.0.0.1", replayq_offload_mode: false, replayq_max_total_bytes: 2147483648, replayq_seg_bytes: 104857600, drop_if_highmem: false, ack_timeout: 10000, partitioner: :random, partition_count_refresh_interval_seconds: 60, max_linger_ms: 0, max_send_ahead: 9, telemetry_meta_data: %{bridge_id: "action:kafka_producer:fixed_topic_disk_recover:connector:kafka_producer:c"}, max_partitions: :all_partitions, reconnect_delay_ms: 2000}, partition: 0, topic: "test-topic-one-partition", client_id: "connector:kafka_producer:c", conn: #PID<0.31253.0>, linger_expire_timer: false}}
```
@thalesmg thalesmg force-pushed the fix-register-huge-filepath-as-atom branch from 1989efa to 1708143 Compare October 17, 2024 14:50
src/replayq.erl Outdated
@@ -515,13 +519,17 @@ ensure_deleted(Filename) ->
%% The committer writes consumer's acked segmeng number + item ID
%% to a file. The file is only read at start/restart.
spawn_committer(ReaderSegno, Dir) ->
Name = iolist_to_binary(filename:join([Dir, committer])),
%% register a name to avoid having two committers spawned for the same dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a lazy implementation to guard bugs in caller modules.
e.g. open without close.

we can maybe find a better way for this exclusiveness check.
e.g. put pid in app env, and assert if pid is alive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a named process sounds like a simple implementation supported by the vm itself, with automatic cleanup of the name in case the committer terminates.

what would be the motivation to get rid of this mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

thalesmg added a commit to thalesmg/replayq that referenced this pull request Oct 17, 2024
#{committers := Committers0} = State0,
case Committers0 of
#{Dir := Pid} ->
%% Already registered to the same pid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be ok, because it means the same process called open twice without closing first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot the pid is actually the committer pid.
then the question is: is this possible to happen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, in the current implementation.

thinking of this as a general registry, repeating the exact same registration should be idempotent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after we discussed, I changed the last commit to attempt to register the called of open as soon as possible to avoid side-effects.

@thalesmg thalesmg force-pushed the fix-register-huge-filepath-as-atom branch from 730ad81 to 3ab41ae Compare October 17, 2024 20:50
{ok, State}.

handle_call(#register_committer{dir = Dir, pid = Pid}, _From, State0) ->
%% Should we expand the directory path to avoid tricks with links and relative paths?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed that we should also normalize the dir (i.e.: don't treat an iolist, string and binary representations of the same path differently). pushed a small fix for that

@thalesmg thalesmg merged commit f84359e into emqx:master Oct 18, 2024
2 checks passed
@thalesmg thalesmg deleted the fix-register-huge-filepath-as-atom branch October 18, 2024 12:19
@thalesmg
Copy link
Contributor Author

tagged 0.3.9

thalesmg added a commit to thalesmg/emqx that referenced this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants