Persistent stream provider to lifecycle#4042
Conversation
|
Side note: There is lots of crazy in all this, but the updated pattern is much cleaner and gives us a good foundation for future improvements. @galvesribeiro, can you spare a glance at the external stream providers you've added. Maybe plan in some time to help me test them later in the week? It would be much appreciated! :) |
| /// <summary> | ||
| /// Configure cluster client to use SQS persistent streams. | ||
| /// </summary> | ||
| public static IServiceCollection AddClusterClientSqsStreams(this IServiceCollection services, string name, |
There was a problem hiding this comment.
I think we should not add any extension methods which hang from IServiceCollection and we should endeavor to remove the existing ones.
(I'd be happy if they were private or internal)
There was a problem hiding this comment.
Sorry, not following... Why?
There was a problem hiding this comment.
Because otherwise we will have methods which are ambiguous between client & silo & SignalR & ASP.NET. I'm thinking about the future, but the issue is present here today: we need to name the methods differently for client and silo instead of having a nice simple "AddSqsStreams" method.
There was a problem hiding this comment.
I see... Well, other framework (including asp.net) does that IIRC. However, I think we should have diff assemblies for client and silo for the providers. For example, in Membership, makes no sense to bring the Membership table to the client, but instead it should be only have the gateway. So Orleans.Clustering.XXXX and Orleans.Clustering.XXXX.Client(or Gateway). I know, extra nuget package to maintain, but the benefits are:
- Avoid this confusion with extensions
- Avoid having 2 methods for client and silo
- In the case of clustering, remove the dependency on the runtime/silo assemblies on teh client.
There was a problem hiding this comment.
Will discuss then address in separate PR based on consensus.
There was a problem hiding this comment.
Maybe the extension methods should at least live in their own namespace? One for the clientbuilder and one for the silo? Better for discoverability.
|
@jason-bragg absolutely. Will grab your branch here and check it. Meanwhile, feel free to ping me on skype of gitter to talk about it. |
|
|
||
| namespace Orleans.Hosting | ||
| { | ||
| public class SqsStreamOptions : PersistentStreamOptions |
There was a problem hiding this comment.
Move options to the Orleans.Configuration namespace
| formatted.AddRange(new[] | ||
| { | ||
| OptionFormattingUtilities.Format(nameof(this.options.DeploymentId), this.options.DeploymentId), | ||
| OptionFormattingUtilities.Format(nameof(this.options.DeploymentId), "<-Redacted->"), |
There was a problem hiding this comment.
Error here, duplicated line. I think the latter is supposed to be DataConnectionString. We can use the existing ConfigUtils.RedactConnectionStringInfo method as in other providers - unless you feel it isn't adequate?
There was a problem hiding this comment.
Good catch on the dupe, tnx.
The redact utility, by my understanding, expects an azure connection string, and knows how to pull the secrets out of that. I don't know if we can trust it to redact secrets from other service connection strings. It would be valuable to be able to strip out secrets for all of these, but I don't have the time to investigate each format, so I've been just placing the generic <-redact-> line in there.
There was a problem hiding this comment.
I added logic in ConfigUtils.RedactConnectionStringInfo to just return <--SNIP--> by default if none of those format we are looking for is there. And i'm off the opinion we should only keep the default logic.
There was a problem hiding this comment.
That's quite fair - I simply don't know how to ensure that we can redact SQS keys correctly, or if we even should. Redacting the entire thing is safer
There was a problem hiding this comment.
Looking at the RedactConnectionStringInfo logic, I'm not confident it's safe. If a connection string has one secret in the list, and one not in the list, it's secrets will be logged. I'm going to stick with the <--SNIP--> for now. Developers/maintainers of a component can develop component specific redactors if it's important. IMO it's better to log less info than risk logging users secrets.
| { | ||
| public string DeploymentId { get; set; } | ||
|
|
||
| public string DataConnectionString { get; set; } |
There was a problem hiding this comment.
Since this is provider-specific, could we call this ConnectionString instead?
There was a problem hiding this comment.
I think it should not be standardazied as connection string on all providers... others don't have it but instead, they ask for something like endpoint and API key... for those cases, our options should reflect that with 2 properties just like I did with the CosmosDB providers.
This DataConnectionString there was a hack back in the day just to "follow the standard".
I'll submit a PR to @jason-bragg branch changing that as well
There was a problem hiding this comment.
Yes, specific names per-provider is fine. Whatever is the typical lingo for that system is what we should use.
| { | ||
| public class SqsStreamOptions : PersistentStreamOptions | ||
| { | ||
| public string DeploymentId { get; set; } |
There was a problem hiding this comment.
Should this value be configured here or should it come from some other, more central place? My understanding is that this should be replaced with SiloOptions.ClusterId, since we smooshed DeploymentId and ClusterId together here: #3728
There was a problem hiding this comment.
Yes, this value should be dropped from there and inject the SiloOptions. The newer providers are already doing that. Will do that on a PR.
|
@galvesribeiro, I've done very little testing with this so far. I should have it stable enough for you to try by mid-day tomorrow. If you want to play with it before then, you're welcome to do so, but I'd expect there to be issues outside the providers I requested your help on. Just don't want to use up your patience on issues I should be solving. :) |
|
@jason-bragg Ok. Take your time. Once you feel it is ready let me know and I'll amend a PR to your branch. I will run all tests as I've essentially ported all providers to .net core back in day and not just the ones I introduced and then will report back. |
|
I'll page @joeybrown, since he is working on #634. |
|
woke up today and a BIG present is awaiting on GitHub |
|
That was my valentines day present to the team/community.. I'm so romantic, as you can all see. |
| Action<OptionsBuilder<SqsStreamOptions>> configureOptions = null) | ||
| { | ||
| return services.TryConfigureFormatterResolver<SqsStreamOptions, SqsStreamOptionsFormatterResolver>() | ||
| .AddSiloPersistentStreams<SqsStreamOptions>(name, SQSAdapterFactory.Create, configureOptions); |
There was a problem hiding this comment.
missing ConfigureNamedOptionForLogging
|
Heads up, for anyone tinkering with this, PR is hilariously broken, as provider initialization needs be run in Orleans scheduling context, which is still under development. I'll update the thread when there is a faint chance this works :/ |
| { | ||
| IOptionsSnapshot<SqsStreamOptions> streamOptionsSnapshot = services.GetRequiredService<IOptionsSnapshot<SqsStreamOptions>>(); | ||
| var factory = ActivatorUtilities.CreateInstance<SQSAdapterFactory>(services, name, streamOptionsSnapshot.Get(name)); | ||
| factory.Init(); |
There was a problem hiding this comment.
should factory handle its init or lifecycle?
There was a problem hiding this comment.
Init is not async, so no reason I can see to link adapters to lifecycle, they are just pluggable components.
| return new Formatter(name, optionsSnapshot.Get(name)); | ||
| } | ||
|
|
||
| public class Formatter : IOptionFormatter<SqsStreamOptions> |
There was a problem hiding this comment.
formatter name should has its option name
There was a problem hiding this comment.
Subclass, so scoped with proper name.
| formatted.AddRange(new[] | ||
| { | ||
| OptionFormattingUtilities.Format(nameof(this.options.DeploymentId), this.options.DeploymentId), | ||
| OptionFormattingUtilities.Format(nameof(this.options.DeploymentId), "<-Redacted->"), |
There was a problem hiding this comment.
I added logic in ConfigUtils.RedactConnectionStringInfo to just return <--SNIP--> by default if none of those format we are looking for is there. And i'm off the opinion we should only keep the default logic.
| Action<OptionsBuilder<AzureQueueStreamOptions>> configureOptions = null) | ||
| where TDataAdapter : IAzureQueueDataAdapter | ||
| { | ||
| return services.TryConfigureFormatterResolver<AzureQueueStreamOptions, AzureQueueStreamOptionsFormatterResolver>() |
There was a problem hiding this comment.
missing ConfigureNamedOptionForLogging
| Action<OptionsBuilder<AzureQueueStreamOptions>> configureOptions = null) | ||
| where TDataAdapter : IAzureQueueDataAdapter | ||
| { | ||
| return services.TryConfigureFormatterResolver<AzureQueueStreamOptions, AzureQueueStreamOptionsFormatterResolver>() |
There was a problem hiding this comment.
many occurrence of missing ConfigureNamedOptionForLogging
There was a problem hiding this comment.
it was being done in AddPersistentStreams, but that's not the right place. They should be there now.
| this.logger = logger; | ||
| } | ||
|
|
||
| private async Task Init(CancellationToken token) |
There was a problem hiding this comment.
what's your plan on help user migrate their persistent stream provider, which they developed on top of the old streaming interfaces, to this new one?
There was a problem hiding this comment.
Most of the adapter interfaces have not changed. The main difference is that they'd need to move to constructor injection as nothing is passed in during the init, and add an options class for their adapter. These are breaking changes, but not dramatic ones (imo).
| public RunState StartupState = DEFAULT_STARTUP_STATE; | ||
| public const RunState DEFAULT_STARTUP_STATE = RunState.AgentsStarted; | ||
|
|
||
| public int InitStage { get; set; } |
There was a problem hiding this comment.
we should set default stages. This is advanced feature only advanced users should touch. We shouldn't create extra friction for less advanced users to understand lifecycle and configure a correct stage
| private IStreamQueueBalancer queueBalancer; | ||
| private readonly IQueueAdapterFactory adapterFactory; | ||
| private PersistentStreamProviderState managerState; | ||
| private RunState managerState; |
There was a problem hiding this comment.
RunState? should just state be sufficient here ? What does run imply here ?
There was a problem hiding this comment.
it replaces PersistentStreamProviderState and tracks if the provider is running or not.
| var balancerConfig = new LeaseBasedQueueBalancerConfig(providerConfig); | ||
| this.leaseProvider = this.serviceProvider.GetRequiredService(balancerConfig.LeaseProviderType) as ILeaseProvider; | ||
| this.leaseLength = balancerConfig.LeaseLength; | ||
| var options = this.serviceProvider.GetServiceByName<LeaseBasedQueueBalancerOptions>(strProviderName) |
There was a problem hiding this comment.
we should have providerBuilder and provider lifecycle. :)
There's too many components in streams. The complexity can benefits from lifecycle and builder pattern, in both perspective of configuring and start and shutdown sequence.
Some of those are managed by Silo(Client)ProviderRuntime, some are managed by the PersistentStreamProvider. Those unnecessary coupling partially result in bad consequence, such as streaming infra has to live in core.
Just throwing ideas out there, not necessary the scope of this PR.
There was a problem hiding this comment.
I'm not convinced of the lifecycle aspect, but it's definitely worth consideration. In the description I've started a list of further refactoring we may want to do in the future. Please feel free to edit the description to add more suggestions.
| if (this.ReceiverMonitorFactory == null) | ||
| this.ReceiverMonitorFactory = (dimensions, telemetryProducer) => new DefaultQueueAdapterReceiverMonitor(dimensions, telemetryProducer); | ||
| if (adapterConfig.GeneratorConfigType != null) | ||
| generatorConfig = this.serviceProvider.GetServiceByName<IStreamGeneratorConfig>(this.Name); |
There was a problem hiding this comment.
constructor injection instead of this.
There was a problem hiding this comment.
generator config is not required, it can be provided at runtime, hence constructor injection will not work :/
|
@galvesribeiro I think core is solid enough for you to test against in the latest PR. Still has test failures, and hacks i'm resolving, but it's mostly their I think.. |
|
@jason-bragg sure, will do my changes in a PR early tomorrow. Probably when you come to office it should be there. Thanks! |
|
@jason-bragg Submitted a PR to your branch regarding migrate SMS stream provider to use option and lifecycle. |
062cf40 to
ed644c4
Compare
|
Sorry for my delay. The PR with the AWS updates were made to your branch @jason-bragg (jason-bragg#4). I've also ran other provider tests and they looks fine so far. |
d5eede5 to
b337ea1
Compare
|
Ready (assuming tests pass) |
|
@jason-bragg did you saw the PR on your branch? |
|
@galvesribeiro, I think. I was a bit confused. |
|
@dotnet-bot test functional please |
There was a problem hiding this comment.
Should this be [RedactConnectionString]? Xiao made a change to redact the entire thing by default if no redactions were made.
BTW, @xiazen, that change also means that UseDevelopmentStorage=true would be redacted, but better to be safe than sorry.
There was a problem hiding this comment.
I don't think so. I'm unconvinced RedactConnectionString is safe for unknown formats. By the logic of the call, if any of the patterns match the string is logged with the recognized patterns redacted. This means if a connection string has a matched, and an unmatched secret, the unmatched secret will logged. which is not acceptable.
There was a problem hiding this comment.
ok, no worries. In the future we can create a version which only includes known-good portions of the connection string, instead. We can wait until someone complains, if they ever do.
|
Functionals seem to be failing again due to test timeout. Pushing a possible optimization of lifecycle to see if it helps. If it doesn't I'll role it back. |
|
I kicked out a test build in vso to run azure tests, which were skipped here. Also hoping the time out in vso is longer, so the build won't fail due to test timeout |
c191b02 to
ecff572
Compare
|
Functionals timed out again. Trying something different. This time I'm running silo lifecycle actions outside the lifecycle scheduling context. |
|
hmm.. functionals seem say they took 30~ish minutes.. I don't trust this. |
- Was not setting the blancer type.
ecff572 to
82c4335
Compare
|
Removed lifecycle perf fix, and rebased. Should be good to merge after tests run. |

Pushing this PR out to get eyes on it, but it's not passing testing yet. Still WIP.
Since persistent stream provider is the base for most of our steam providers, we couldn't refactor these one at a time.
Following stream providers now use options and lifecycle.
TBD:
Fix tests.
Move options to Orleans.Configuration namespace.
Replace DataConnectionStream with ConnectionString in options to conform to other options.
Possibly validators, but would like to push that to separate PR, given the size of this change.
NOTE: Persistent stream providers and various adapters are complex and patterns can be improved now that we can inject dependencies, but current patterns are best effort port. Please note where further refactoring in the future is warranted.
Future Refactor: