Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SY-1374 Fix Windows Channel Deletion Bug #888

Merged
merged 29 commits into from
Nov 10, 2024
Merged

Conversation

LeonLiur
Copy link
Contributor

@LeonLiur LeonLiur commented Nov 4, 2024

Feature Pull Request Template

Key Information

  • Linear Issue: SY-#

Description

Fixed the following bugs that were failing the Windows CI since deletion of a file to which a handle is open is allowed on Unix, but not Windows.

  1. index persist is not closed upon unary server close
  2. after an unsuccessful unaryDB close, the db is marked as close even though the close failed, leading it be marked as closed although file handles are still open.
  3. test cases asserting on opening a pre-existing DB in an FS does not close the DB.
  4. windows implementation of dirInfo does not return size correctly
  5. test in channel_test and testutil_test forgot to close file handle
  6. test in unary_test and index_test forgot to close DB
  7. when opening cesium DB fails, meta file handle was never closed.

Basic Readiness

  • I have performed a self-review of my code.
  • I have added relevant tests to cover the changes to CI.
  • I have needed QA steps to the release
    candidate
    template that cover these changes.
  • I have updated in-code documentation to reflect the changes.
  • I have updated user-facing documentation to reflect the changes.

Backwards Compatibility

The following makes sure that this feature does not break backwards compatability.

Data Structures

  • Server - I have ensured that previous versions of stored data structures are
    properly migrated to new formats.
  • Console - I have ensured that previous versions of stored data structures are
    properly migrated to new formats.

API Changes

  • Server - The server API is backwards-compatible
  • The following client APIs are backwards-compatible:
    • C++
    • TypeScript
    • Python

Breaking Changes

If anything in this section is not true, please list all breaking changes.

@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.45%. Comparing base (d18d902) to head (cd31822).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##               rc     #888      +/-   ##
==========================================
- Coverage   45.46%   45.45%   -0.02%     
==========================================
  Files        1121     1121              
  Lines       69114    69095      -19     
  Branches     3597     3597              
==========================================
- Hits        31424    31408      -16     
+ Misses      36620    36617       -3     
  Partials     1070     1070              
Flag Coverage Δ
clientpy 86.04% <ø> (ø)
clientts 62.10% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cesium/db.go Outdated
@@ -103,7 +103,7 @@ func (db *DB) Read(ctx context.Context, tr telem.TimeRange, keys ...core.Channel
// database, a deadlock is caused since the signal context is closed while the writers
// attempt to send to relay.
func (db *DB) Close() error {
if !db.closed.CompareAndSwap(false, true) {
if db.closed.Load() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sussy. Just loading the closed state here could cause a race condition if close is called concurrently with another DB method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved by first doing compare and swap then storing false if close failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this still an issue? why is this not a compare and swap rather than storing true at the end. I assume the intention of doing this is so that if a step later on fails, the db isnt still marked as closed. But doesn't this introduce the issue of another thread interpereting the db as open while its in the process of getting closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No – during the process of the db being closed, the CompareAndSwap call makes it so that closed is true – meaning other operations are not allowed. If the close failed, then those operations are allowed again

cesium/internal/domain/db.go Outdated Show resolved Hide resolved
.github/workflows/test.aspen.yaml Outdated Show resolved Hide resolved
.github/workflows/test.cesium.yaml Outdated Show resolved Hide resolved
.github/workflows/test.x.yaml Outdated Show resolved Hide resolved
@pjdotson pjdotson changed the title Leo fix windows delete Fix Windows Channel Deletion Bug Nov 9, 2024
@pjdotson pjdotson changed the title Fix Windows Channel Deletion Bug SY-1374 Fix Windows Channel Deletion Bug Nov 9, 2024
cesium/db.go Outdated
@@ -103,7 +103,7 @@ func (db *DB) Read(ctx context.Context, tr telem.TimeRange, keys ...core.Channel
// database, a deadlock is caused since the signal context is closed while the writers
// attempt to send to relay.
func (db *DB) Close() error {
if !db.closed.CompareAndSwap(false, true) {
if db.closed.Load() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this still an issue? why is this not a compare and swap rather than storing true at the end. I assume the intention of doing this is so that if a step later on fails, the db isnt still marked as closed. But doesn't this introduce the issue of another thread interpereting the db as open while its in the process of getting closed?

Copy link
Contributor

@emilbon99 emilbon99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after my comments are addressed

cesium/db.go Outdated Show resolved Hide resolved
cesium/internal/domain/db.go Outdated Show resolved Hide resolved
cesium/internal/unary/db.go Show resolved Hide resolved
@LeonLiur LeonLiur merged commit 12c3c3e into rc Nov 10, 2024
42 of 43 checks passed
@pjdotson pjdotson deleted the leo-fix-windows-delete branch November 11, 2024 23:06
Lham42 added a commit that referenced this pull request Nov 13, 2024
* Auto Update JSON File

* [ops] - merged main and bumped RC version

* Auto Update JSON File

* Auto Update JSON File

* Auto Update JSON File

* Auto Update JSON File

* SY-1416 fix issues with renaming multiple channels (#894)

* [synnax] - fixed issues with renaming multiple channels of different types (virtual, free virtual, persisted)

* [synnax] - improved naming

* SY-1374 Fix Windows Channel Deletion Bug (#888)

* [cesium] - test focusing on the happy path in delete

* [ops] - added code to run windows tests on pushes on this branch

* [ops] - added code to run windows tests on pushes on this branch

* [ops] - changed order of testing on different OS

* [ops] - changed order of testing on different OS

* [cesium] - added one more test case for the case where no writer was ever opened

* [cesium] fixed 3 bugs:
1. first cause of unable to delete is because index persist is not closed upon unary server close
2. second bug fix was after an unsuccessful unaryDB close, the db is marked as close even though the close failed, leading it be marked as closed although file handles are still open.
3. test cases asserting on opening a pre-existing DB in an FS does not close the DB.

* [cesium] fixed flakey tests and windows List() fs

* [cesium] fixed various leaking resources in tests and meta file handle

* [server] removed blocking windows deletes

* edited workflow files for aspen, cesium, and synnax

* added back in quotes

* ops(x): run x workflows on CI

* fix file path

* ops(go): remove fail-fast from go test ci workflows

* [cesium] - addressed load race condition and fixed counter test

* [x] - fixed a flakey test

* [x] - fixed a flakey test

* [x] - fixed a flakey test

* [server] - skipped a failing test on windows

* remove hyphens

* [cesium] - concurrency patch for cesium DB

* [cesium] - concurrency patch for cesium DB

* [cesium] - changed db close mechanism so that closed is only marked as false when it fails in a nominal case

* [cesium] - added comment explaining close conditions in unary close

* [cesium] - added comment explaining db close

---------

Co-authored-by: Synnax Windows EC2 <ebonilla@synnaxlabs.com>
Co-authored-by: pjdotson <patrick@synnaxlabs.com>

* SY-1453 Add Scientific and Engineering Notation to Schematic Values (#909)

* [driver/labjack] move open call to constructor to shift work from start to configure

* checkpoint

* checkpoint

* got notation working

* formatting

* move value telem form to two lines

---------

Co-authored-by: Elham Islam <elham@synnaxlabs.com>

* SY-1467 Labjack reconnect/disconnect (#907)

* [driver/labjack] move open call to constructor to shift work from start to configure

* [driver/labjack] global map for serial nums and handles

* [HOTFIX]  SY-1441 Update NI channel config for the python client (#896)

* [client/py/ni] update ni channel config on python

* [ops] bump py client version

* [ops] bump py client version

* [client/py] update ni tests

* [client/py] - added default device check to NI

* [client/py] - remove unused imports and formattting

---------

Co-authored-by: Emiliano Bonilla <56323762+emilbon99@users.noreply.github.com>

* [driver/labjack] append last 4 digits of ser number to dev name

* [console] - fixed issues with task dialog flashing (#900)

* [console] - fixed issues with task dialog flashing

* [console] - rollback windows gen schema

* Sy 1461 fix issues with improper window shutdown throwing errors (#901)

* [console] - fixed issues with task dialog flashing

* [console] - updated various layout hooks to tolerate delayed main window closure

* [console] - updated store to disable debug

* [console] - rollback windows gen schema

* [ops] version bumps

* [console] - added aliases to labjack port selection (#905)

* [driver/labjack] refactor open for writer as well

* SY-1435 Fix Add to New Plot on Line Plot Range Highlight (#899)

* fix open new plot on range highlight

* ops: bump version

* [console] - updated windows schema to match the one one main

* SY-1444 Fix Taring on Multiple Channels (#898)

* [driver/labjack] create device manager

* [client/py] - added type coersion for primitive non-string values in … (#903)

* [client/py] - added type coersion for primitive non-string values in range KV

* [client/py] - updated range kv to accept any stringer

* [pluto] - improved value font sizing and spacing (#902)

* [pluto] - improved value font sizing and spacing

* [pluto] - updated value positioning

* [pluto] - updated value text positioning and cnsolidated units level

* [pluto] - adjusted value positioning to smoothly handle negatives

* [ops] bump pluto version

* [driver/labjack] format code

* [ops] bump drift version

* [driver/labjack] fix handling disconnected devices while write/read tasks are actively running

* [console] missed merge conflict

* [console] accepted wrong changes

* [driver/labjack] remove commented out code

* [driver/labjack] add other read errors that are caused by disconnect

* [driver/labjack] change error sent

* [driver/labjack] format code

---------

Co-authored-by: Emiliano Bonilla <56323762+emilbon99@users.noreply.github.com>
Co-authored-by: Patrick Dotson <123601024+pjdotson@users.noreply.github.com>

* Auto Update JSON File

* SY-1365 protect main branch but allow auto pushes from workflows (#911)

* [ops] use deploy key to checkout code so github runner can push to main

* [ops] remove fake param

* [ops] - updated dependencies (#912)

* Auto Update JSON File

* SY-1486 Improve Schematic Connection Lines (#910)

* Auto Update JSON File

* SY-1390 ESLint New Configuration (#913)

* Auto Update JSON File

* [console] - fixed LabJack port indices (#917)

* [console] - fixed LabJack port indices

* [console] - reverted default parameter in LabJack write task

* [console] - fixed LabJack DI ports

* Sy 1403 ni scanner toggle on/off from command palette (#916)

* [driver/ni] add device mutex for scanner

* [driver/ni] undo mutex

* [driver/ni] white space

* [driver/ni] ws

* [console] - implemented NI task toggling

* [driver] verify can disable scanner

* [driver] first join thread before releasing ni handles

* [console] invert scanner toggle message

---------

Co-authored-by: Emiliano Bonilla <56323762+emilbon99@users.noreply.github.com>

* [drift/tauri] - fixed logical and physical position calculations (#919)

* SY-1442 v.033, v0.34 Release Notes (#920)

---------

Co-authored-by: pjdotson <pjdotson@users.noreply.github.com>
Co-authored-by: Emiliano Bonilla <56323762+emilbon99@users.noreply.github.com>
Co-authored-by: emilbon99 <emilbon99@users.noreply.github.com>
Co-authored-by: pjdotson <patrick@synnaxlabs.com>
Co-authored-by: Leo Liu <50342884+LeonLiur@users.noreply.github.com>
Co-authored-by: Synnax Windows EC2 <ebonilla@synnaxlabs.com>
Co-authored-by: Patrick Dotson <123601024+pjdotson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants