Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use server for concurrency of 1 when writing analysis output & readability factors #54

Merged
merged 11 commits into from
Mar 24, 2024

Conversation

EllAchE
Copy link
Collaborator

@EllAchE EllAchE commented Mar 10, 2024

No description provided.

@EllAchE EllAchE requested a review from bennyrubanov March 10, 2024 10:48
@EllAchE EllAchE changed the title rename and create some functions Use server for concurrency of 1 when writing analysis output & readability factors Mar 11, 2024
src/metrics/promotions.ts Show resolved Hide resolved
src/metrics/promotions.ts Show resolved Hide resolved
src/metrics/promotions.ts Show resolved Hide resolved
// will just write to wherever the process is running, but the server needs to be launched from the same directory so we use an abs path
export const RESULTS_PATH = `${__dirname}/results.json`;

function launchQueueServer() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this. curious if it works - have you tested the decompression script with this?
as discussed on call, if this doesn't work, I think you could make a bunch of unique results.json files (one for each analysis), and then append all of them in the end
Alternatively, could also make one results.json file for each item of the batch that's being run. I.e. if the code is run in batches of 13, then make 13 results.json files (results1.json, results2.json, etc) and then make sure each item writes to one of them, then append the 13 results.json files at the end. i think this actually might be the easiest thing to implement (if server doesn't work... i don't know much about the createServer function)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is even in results.json? I assumed there were issues if we did separate files bc we'd have to track weighted averages (which we don't rn) which is why I went with the queue process.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objects that look like this (after running the script):
{
"Number of games analyzed": 23,
"maxKDRatio": 14,
"pieceWithHighestKDRatio": [
"K"
],
"KDRatios": {
"RA": 1.4,
"NB": 0.8235294117647058,
"BC": 0.7777777777777778,
"Q": 1.8571428571428572,
"K": 14,
"BF": 0.8333333333333334,
"NG": 1.4615384615384615,
"RH": 1.0833333333333333,
"PA": 0.25,
"PB": 0.8333333333333334,
"PC": 0.7142857142857143,
"PD": 0.5714285714285714,
"PE": 0.9285714285714286,
"PF": 1.2,
"PG": 0.42857142857142855,
"PH": 0.2857142857142857,
"ra": 1.4545454545454546,
"nb": 0.8235294117647058,
"bc": 1.2941176470588236,
"q": 3.1,
"k": 7.5,
"bf": 0.6470588235294118,
"ng": 1.4,
"rh": 1,
"pa": 0.8571428571428571,
"pb": 0.6,
"pc": 0.5,
"pd": 0.5,
"pe": 0.35714285714285715,
"pf": 0.2222222222222222,
"pg": 0.6666666666666666,
"ph": 0
},

src/results.json Outdated Show resolved Hide resolved
src/run_metrics_on_file.ts Show resolved Hide resolved
src/run_metrics_on_file.ts Show resolved Hide resolved
const base_path = `${__dirname}/../../data/${file.replace('.zst', '')}`;

// Create a new file path
const newFilePath = `${base_path}_${randomUUID()}`;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does randomUUID create a random file path number? i prefer counting them. corresponds better with the log here too console.log(Creating file #${file_counter} at ${newFilePath});

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue with counting is that in high concurrency they will not match up. We could do it by game identifier or institute locking if necessary, but for uniquness I use UUID

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do both? one for logs, one for avoiding concurrency/overwrites?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the file was renamed, it's hard for me to see what specific changes you've made here. please test this script before merging into main to make sure it's working properly. I'd prefer above all if you could loom/screen record the running of the script on a smaller dataset, and I can test if it works properly. Or I can run it myself - let me know when it's ready to be run (now?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if i can run this

Comment on lines 85 to 86
const base_path = `${__dirname}/../../data/${file.replace('.zst', '')}`;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not accurately tracking

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating file #1 at /Users/bennyrubanov/Coding_Projects/chessanalysis/src/../../data/lichess_db_standard_rated_2013-01.pgn_e261b15e-1f2b-4e3b-b3c9-d4c352af17c1

should be chessanalysis/data/lichess...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you run this with ts node the paths will be different than if running the transpiled cod.e That's probably why you saw all the differences

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think my changes addressed this properly?

@bennyrubanov
Copy link
Owner

Tried running the decompression script using ts-node src/zst_decompressor.ts and was running into lots of issues. I fixed the file path ones, but then it started creating way too many decompressed files (like 7.5k of them that were each 8kb in size as opposed to being equivalent to SIZE_LIMIT).

It's a pain to try to figure out what you changed because the whole file got renamed and so i can't see specific line changes. Somewhere, in handling the writing to the file and the check for SIZE_LIMIT, something got moved into the wrong order. Can you review this and figure out what got broken?

@EllAchE
Copy link
Collaborator Author

EllAchE commented Mar 20, 2024

Tried running the decompression script using ts-node src/zst_decompressor.ts and was running into lots of issues. I fixed the file path ones, but then it started creating way too many decompressed files (like 7.5k of them that were each 8kb in size as opposed to being equivalent to SIZE_LIMIT).

It's a pain to try to figure out what you changed because the whole file got renamed and so i can't see specific line changes. Somewhere, in handling the writing to the file and the check for SIZE_LIMIT, something got moved into the wrong order. Can you review this and figure out what got broken?

note that using ts-node vs. transpiled code will cause issues with paths

@@ -108,9 +114,10 @@ const decompressAndAnalyze = async (file, start = 0) => {

// https://www.npmjs.com/package/node-zstandard#decompressionstreamfromfile-inputfile-callback
zstd.decompressionStreamFromFile(
`${__dirname}/../../data/${file}`,
`${compressedFilePath}/${file}`,
Copy link
Collaborator Author

@EllAchE EllAchE Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bennyrubanov fyi these aren't bugs. They are differences between your workflow using ts-node and mine which transpiles to JS first.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood. can we write code to work with both?

@EllAchE
Copy link
Collaborator Author

EllAchE commented Mar 20, 2024

Tried running the decompression script using ts-node src/zst_decompressor.ts and was running into lots of issues. I fixed the file path ones, but then it started creating way too many decompressed files (like 7.5k of them that were each 8kb in size as opposed to being equivalent to SIZE_LIMIT).

It's a pain to try to figure out what you changed because the whole file got renamed and so i can't see specific line changes. Somewhere, in handling the writing to the file and the check for SIZE_LIMIT, something got moved into the wrong order. Can you review this and figure out what got broken?

the 8kb files are bc I'm reassigning new file path each time, probably. It should just be when size is too big

@EllAchE
Copy link
Collaborator Author

EllAchE commented Mar 20, 2024

Tried running the decompression script using ts-node src/zst_decompressor.ts and was running into lots of issues. I fixed the file path ones, but then it started creating way too many decompressed files (like 7.5k of them that were each 8kb in size as opposed to being equivalent to SIZE_LIMIT).

It's a pain to try to figure out what you changed because the whole file got renamed and so i can't see specific line changes. Somewhere, in handling the writing to the file and the check for SIZE_LIMIT, something got moved into the wrong order. Can you review this and figure out what got broken?

The specific issue of small files is fixed

@EllAchE
Copy link
Collaborator Author

EllAchE commented Mar 20, 2024

@bennyrubanov I've made some updates to this script. There is remaining work before this works out of the box but I'd prefer to merge then come back with followups. Specifically:

  • results.json should be read by the queue process & combined there to ensure no race conditions
  • orphaned processes should be killed
  • max concurrency should be set and enforced

@EllAchE EllAchE requested a review from bennyrubanov March 20, 2024 07:59
…size and file size limit, as well as error catch for missing files
@bennyrubanov bennyrubanov merged commit 4294a7d into main Mar 24, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants