Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed mutex issue #80

Merged
merged 18 commits into from
Jun 28, 2024
Merged

fixed mutex issue #80

merged 18 commits into from
Jun 28, 2024

Conversation

terryluan12
Copy link
Contributor

Fixes #79

* fixed stat function in overlay.ts not throwing error properly
@james-pre
Copy link
Member

@terryluan12 If you don't mind, please add tests. Thanks.

Copy link
Member

@james-pre james-pre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I would really appreciate it if you could add tests. isLocked adds complexity that I think is not needed— see my review comment. Also, I think using async/await would really help.

src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
src/mutex.ts Outdated Show resolved Hide resolved
@james-pre james-pre added the bug Something isn't working label Jun 17, 2024
@james-pre james-pre added this to the 1.0 milestone Jun 17, 2024
@terryluan12
Copy link
Contributor Author

@terryluan12 If you don't mind, please add tests. Thanks.

Yep absolutely! I'll probably get the tests done on Wednesday

@james-pre
Copy link
Member

Sounds good. Thanks again for the PR.

@terryluan12
Copy link
Contributor Author

terryluan12 commented Jun 24, 2024

Hey @james-pre should be good to review again, sorry about the delay. I think the isLocked is still necessary; In the following code:

const mutex = new Mutex();

const foo = () => {
mutex.lock("path")
// critical section
mutex.unlock("path")
}

spawnThread(foo)
spawnThread(foo)

in the scenario above, if the first thread were to be inside the critical section at the same time as the second thread were to start locking, and the second thread were to check the length of the queue, it would see it as 0.

Copy link
Member

@james-pre james-pre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why Mutex.lock is awaited every time it is called. Your example incorrectly uses the functions, so naturally it won't work correctly. Mutex.lock should await the previous lock.

const mutex = new Mutex();

async function foo(n: number): void {
    console.log(`#${n} locking...`);
    await mutex.lock('path');
    console.log(`#${n} locked`);
    // something important
    console.log(`#${n} unlocking...`);
    mutex.unlock('path');
}

foo(1);
foo(2);

In essence, this means the above example should always behave like so:

#1 locking...
#2 locking... OR #1 locked
#1 locked OR #2 locking...
#1 unlocking...
#2 locked
#2 unlocking...

I'd prefer the code to only store one promise in the mutex for each path.

@terryluan12
Copy link
Contributor Author

terryluan12 commented Jun 25, 2024

Sorry, I forgot to add the await keyword before mutex.lock in the example. In the case it is added however, it should work though right?

To clarify, I was just attempting to justify why I thought the isLocked field within the mutex is needed. Since after the await mutex.lock('path'); line is executed, the queue should be empty. Therefore, you need the isLocked field to signify to other threads/Web Workers, attempting to access resources that the path is locked.

@james-pre
Copy link
Member

In any case, there should only ever be one lock on a path and therefore one promise. To be honest, my current implementation is not good and does not follow that correctly. I think ideally it would look something like this:

interface MutexLock<T> {
	promise: Promise<T>;
	resolve(): T;
	reject(): Error;
}

class Mutex {
	protected locks: Map<string, MutexLock<void>> = new Map();

	public async lock(path: string): Promise<void> {
		if(this.locks.has(path)) {
			await this.locks.get(path).promise;
		}

		let resolve: () => T;
		let reject: () => Error;
		const promise = new Promise<void>((_resolve, _reject) => {
			resolve = _resolve;
			reject = _reject;
		});
		this.locks.set(path, { promise, resolve, reject });

		// Do not return the promise!
		// It will be resolved on unlock()
	}

	public unlock(path: string): void {
		if(!this.locks.has(path)) {
			throw new ErrnoError(/* ... */);
		}

		// We have already checked that locks has path
		this.locks.get(path)!.resolve();
	}
}

I had already some code similar to this about a month ago. I did not push it after you opened the PR since I didn't want to create any confusion. Please let me know what you think.

Since after the await mutex.lock('path'); line is executed, the queue should be empty.

The "queue" is meant to track active locks. Since there should only ever be one lock at a time for a path, having an array doesn't really make sense. Does this clear up what the locks is meant to do?

Thanks,
- JP

@terryluan12
Copy link
Contributor Author

Kind of? I took a look over the ideal code you had sent, and I'm a little confused on how the code should run. I'm assuming a similar setup to the test code up above with the await mutex.lock() line? In that case, I believe the same issue occurs that currently is plaguing the mutex.

Since the code awaits until the promise resolves, the execution freezes. However, within the lock function, the resolve, reject, and promise is only added to the queue, never called. Since resolve and reject is never called, I believe that means the promise never resolves, and since execution is frozen, there is no chance for the program to call unlock(). Please let me know if I'm missing something

@james-pre
Copy link
Member

I took a look over the ideal code you had sent, and I'm a little confused on how the code should run. I'm assuming a similar setup to the test code up above with the await mutex.lock() line? In that case, I believe the same issue occurs that currently is plaguing the mutex.

Yes, it should be used like the example.

Since the code awaits until the promise resolves, the execution freezes. However, within the lock function, the resolve, reject, and promise is only added to the queue, never called.

Please note the comment that says "do not return this promise". I was trying to explain that we create a new promise for the lock, but do not await it (that way code execution continues). Lock essentially works like so:

  • If a lock already exists for a path, wait until unlock
  • create a new lock
  • add the new lock to locks, but do not wait for unlock

Also, resolve is called by unlock.

This all means that from when lock is called to when unlock is called, no other code can run that needs a lock on the path.

@terryluan12
Copy link
Contributor Author

terryluan12 commented Jun 25, 2024

Oh okay, I think that makes sense! In that case, would you like to implement it, or should I?

@james-pre
Copy link
Member

@terryluan12 I went ahead and implemented it, I hope you don't mind. Note the new implementation uses Promise.withResolvers(), which is relatively new. Please let me know if you want to make more changes, I'd welcome them.

Copy link
Member

@james-pre james-pre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost everything looks ready. After this is merged, I plan on folding Mutex into LockedFS, since that is the only place it is used.

src/backends/index/index.ts Outdated Show resolved Hide resolved
src/mutex.ts Show resolved Hide resolved
@terryluan12
Copy link
Contributor Author

@terryluan12 I went ahead and implemented it, I hope you don't mind. Note the new implementation uses Promise.withResolvers(), which is relatively new. Please let me know if you want to make more changes, I'd welcome them.

Sounds good, thank you for your help! I'm trying to fix an issue with the createParentDirectories function, but will update when it's fixed.

I'm also still getting the hanging issue in the mutex input. I have updated the zen-fs example program I've been using here (https://github.com/terryluan12/zen-fs-example).

@terryluan12
Copy link
Contributor Author

Just as an update, I figured out that the issue with the hanging wasn't a problem with the mutex, but with locked.ts haha. It turns out if an error is thrown, when the mutex is locked, it doesn't get a chance to unlock. Fixing a couple of final minor issues that occur when the local IndexedDB is initializing.

@terryluan12
Copy link
Contributor Author

terryluan12 commented Jun 28, 2024

I think I have narrowed down the two bugs.

From the zentest repo:
Error 1:

Error: EPERM: Can not unlock an already unlocked path, '/desktop/dir'
    ErrnoError error.js:261
    unlock mutex.js:34
    stat locked.js:54
    realpath promises.js:705
    realpath promises.js:702
    exists promises.js:315
    _open promises.js:387
    open promises.js:425
    readFile promises.js:430
    readFile async.js:85
    useTestFileSystem fileSystem.ts:32

This error only occurs when reactStrictMode is turned on.

Error two:

Error: EEXIST: File exists, '/desktop/dir/test'
    ErrnoError error.js:261
    With error.js:249
    commitNew fs.js:585
    createFile fs.js:202
    sync overlay.js:61
    sync file.js:211
    read file.js:365
    readFile promises.js:109
    readFile promises.js:432
    readFile async.js:85
    useTestFileSystem fileSystem.ts:28

This error only occurs when you are calling fs.readFile twice, and only on initialization of the IndexedDB system (therefore, if it's already got data within, you'll have to delete it manually within the browser).

Both seem to be race conditions, with the first one occurring due to the re-render of the component, and the second one occurring due to the async fs.readFile functions.

I think I'm a bit stumped as to good ways to resolve these race conditions though, and would love your help/feedback. Please let me know!

@james-pre
Copy link
Member

In response to your first comment:

Note this try/finally logic can be nicely abstracted away with using (explicit resource management). The using syntax is already supported by Typescript and esbuild, so the only thing that might need to be updated is ts-jest (see kulshekhar/ts-jest#4303). After that, and merging the Mutex logic with LockedFS, methods can be cleaned up to look like:

using _ = await this.lock(path);
// ... ops ...

Since the @@dispose and @@asyncDispose symbol properties on a returned MutexLock could just call LockedFS.unlock(path).

I digress, these are some future ideas that would really help the codebase.

Thank you again for the PR, I really appreciate it.

@james-pre
Copy link
Member

I think I'm a bit stumped as to good ways to resolve these race conditions though, and would love your help/feedback. Please let me know!

Of course!

... the first one occurring due to the re-render of the component

For this error, the only advice I have is to check what other FS functions were called before. I will be out of town this weekend so can't do any in depth debugging, though I will try to help as much as I can.

... the second one occurring due to the async fs.readFile functions.

This error only occurs when you are calling fs.readFile twice, and only on initialization of the IndexedDB system (therefore, if it's already got data within, you'll have to delete it manually within the browser).

This makes sense. IndexedDB caches content by first copying everything to a sync FS on initialization. Attempting to initialize it twice will try to copy everything twice. To fix this, I recommend you modifer your code so it is only initialized once.

This problem also hints at a potential bug with AsyncFS.crossCopy not checking if the files exist prior to attempting to copy them.

Copy link
Member

@james-pre james-pre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good so far. Please explain the change to backends/index.ts. I put some questions in a separate comment. Thanks again.

@terryluan12
Copy link
Contributor Author

Everything looks good so far. Please explain the change to backends/index.ts. I put some questions in a separate comment. Thanks again.

Sounds good, I believe I responded! I just made a change to hopefully improve readability? Let me know

@terryluan12
Copy link
Contributor Author

In response to your first comment:

Note this try/finally logic can be nicely abstracted away with using (explicit resource management). The using syntax is already supported by Typescript and esbuild, so the only thing that might need to be updated is ts-jest (see kulshekhar/ts-jest#4303). After that, and merging the Mutex logic with LockedFS, methods can be cleaned up to look like:

using _ = await this.lock(path);
// ... ops ...

Since the @@dispose and @@asyncDispose symbol properties on a returned MutexLock could just call LockedFS.unlock(path).

I digress, these are some future ideas that would really help the codebase.

Thank you again for the PR, I really appreciate it.

Sounds good, thanks for the knowledge! I hadn't heard of explicit resource management before! I'll give it a go a bit later (I may also put it in a new issue if that's alright). For now, I want to get all the bugs out of the zentest repo.

@terryluan12
Copy link
Contributor Author

terryluan12 commented Jun 28, 2024

I think I'm a bit stumped as to good ways to resolve these race conditions though, and would love your help/feedback. Please let me know!

Of course!

... the first one occurring due to the re-render of the component

For this error, the only advice I have is to check what other FS functions were called before. I will be out of town this weekend so can't do any in depth debugging, though I will try to help as much as I can.

... the second one occurring due to the async fs.readFile functions.

This error only occurs when you are calling fs.readFile twice, and only on initialization of the IndexedDB system (therefore, if it's already got data within, you'll have to delete it manually within the browser).

This makes sense. IndexedDB caches content by first copying everything to a sync FS on initialization. Attempting to initialize it twice will try to copy everything twice. To fix this, I recommend you modifer your code so it is only initialized once.

This problem also hints at a potential bug with AsyncFS.crossCopy not checking if the files exist prior to attempting to copy them.

No worries, thanks for your help so far! I'll focus on the second problem for now. You mentioned that IndexedDB caches content on initialization. Is it supposed to cache the entire readable database? I don't believe it's working properly if this is the expected behaviour. Instead, what it's currently doing is starting as a blank database, and only copying to the writable database as the user reads from the readable database

@james-pre
Copy link
Member

You mentioned that IndexedDB caches content on initialization. Is it supposed to cache the entire readable database?

This caching is unrelated to Overlay. The caching is done by AsyncFS so synchronous functions can be used on a purely asynchronous filesystem.

@james-pre
Copy link
Member

Why the change to using a regex in Index.fromJSON? Please note that the index entries are supposed to be sorted so the directory will come after all of the files contained in it, that way Index.dirEntries, which depends on the inherited Map.keys works. I've hopefully fixed the entry ordering in ec86a6a.

@james-pre
Copy link
Member

james-pre commented Jun 28, 2024

I think I understand the change made. I believe it is not needed because of the dirname check and usage of basename in Index.dirEntries. I'm going to roll back that specific change since I believe it should be handled in a separate PR, please let me know what you think.

Also, I managed to get the using/await using syntax working.

Copy link
Member

@james-pre james-pre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terryluan12,

Everything looks good.

I reverted the change you made to Index.fromJSON since I think it is outside the scope of this PR. Feel free to open another PR.

Ready to merge?

@terryluan12
Copy link
Contributor Author

Sounds good! In that case, everything looks good to me, feel free to merge!

@terryluan12
Copy link
Contributor Author

Thanks for all your help!

@james-pre james-pre merged commit 6a9cb63 into zen-fs:main Jun 28, 2024
3 checks passed
@terryluan12 terryluan12 deleted the mutex_fix branch June 28, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutex does not work correctly
2 participants