-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use server for concurrency of 1 when writing analysis output & readability factors #54
Conversation
// will just write to wherever the process is running, but the server needs to be launched from the same directory so we use an abs path | ||
export const RESULTS_PATH = `${__dirname}/results.json`; | ||
|
||
function launchQueueServer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this. curious if it works - have you tested the decompression script with this?
as discussed on call, if this doesn't work, I think you could make a bunch of unique results.json files (one for each analysis), and then append all of them in the end
Alternatively, could also make one results.json file for each item of the batch that's being run. I.e. if the code is run in batches of 13, then make 13 results.json files (results1.json, results2.json, etc) and then make sure each item writes to one of them, then append the 13 results.json files at the end. i think this actually might be the easiest thing to implement (if server doesn't work... i don't know much about the createServer function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is even in results.json? I assumed there were issues if we did separate files bc we'd have to track weighted averages (which we don't rn) which is why I went with the queue process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects that look like this (after running the script):
{
"Number of games analyzed": 23,
"maxKDRatio": 14,
"pieceWithHighestKDRatio": [
"K"
],
"KDRatios": {
"RA": 1.4,
"NB": 0.8235294117647058,
"BC": 0.7777777777777778,
"Q": 1.8571428571428572,
"K": 14,
"BF": 0.8333333333333334,
"NG": 1.4615384615384615,
"RH": 1.0833333333333333,
"PA": 0.25,
"PB": 0.8333333333333334,
"PC": 0.7142857142857143,
"PD": 0.5714285714285714,
"PE": 0.9285714285714286,
"PF": 1.2,
"PG": 0.42857142857142855,
"PH": 0.2857142857142857,
"ra": 1.4545454545454546,
"nb": 0.8235294117647058,
"bc": 1.2941176470588236,
"q": 3.1,
"k": 7.5,
"bf": 0.6470588235294118,
"ng": 1.4,
"rh": 1,
"pa": 0.8571428571428571,
"pb": 0.6,
"pc": 0.5,
"pd": 0.5,
"pe": 0.35714285714285715,
"pf": 0.2222222222222222,
"pg": 0.6666666666666666,
"ph": 0
},
src/zst_decompressor.ts
Outdated
const base_path = `${__dirname}/../../data/${file.replace('.zst', '')}`; | ||
|
||
// Create a new file path | ||
const newFilePath = `${base_path}_${randomUUID()}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does randomUUID create a random file path number? i prefer counting them. corresponds better with the log here too console.log(
Creating file #${file_counter} at ${newFilePath});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue with counting is that in high concurrency they will not match up. We could do it by game identifier or institute locking if necessary, but for uniquness I use UUID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do both? one for logs, one for avoiding concurrency/overwrites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the file was renamed, it's hard for me to see what specific changes you've made here. please test this script before merging into main to make sure it's working properly. I'd prefer above all if you could loom/screen record the running of the script on a smaller dataset, and I can test if it works properly. Or I can run it myself - let me know when it's ready to be run (now?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if i can run this
src/zst_decompressor.ts
Outdated
const base_path = `${__dirname}/../../data/${file.replace('.zst', '')}`; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not accurately tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating file #1 at /Users/bennyrubanov/Coding_Projects/chessanalysis/src/../../data/lichess_db_standard_rated_2013-01.pgn_e261b15e-1f2b-4e3b-b3c9-d4c352af17c1
should be chessanalysis/data/lichess...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you run this with ts node the paths will be different than if running the transpiled cod.e That's probably why you saw all the differences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think my changes addressed this properly?
Tried running the decompression script using ts-node src/zst_decompressor.ts and was running into lots of issues. I fixed the file path ones, but then it started creating way too many decompressed files (like 7.5k of them that were each 8kb in size as opposed to being equivalent to It's a pain to try to figure out what you changed because the whole file got renamed and so i can't see specific line changes. Somewhere, in handling the writing to the file and the check for SIZE_LIMIT, something got moved into the wrong order. Can you review this and figure out what got broken? |
note that using ts-node vs. transpiled code will cause issues with paths |
@@ -108,9 +114,10 @@ const decompressAndAnalyze = async (file, start = 0) => { | |||
|
|||
// https://www.npmjs.com/package/node-zstandard#decompressionstreamfromfile-inputfile-callback | |||
zstd.decompressionStreamFromFile( | |||
`${__dirname}/../../data/${file}`, | |||
`${compressedFilePath}/${file}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bennyrubanov fyi these aren't bugs. They are differences between your workflow using ts-node and mine which transpiles to JS first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. can we write code to work with both?
the 8kb files are bc I'm reassigning new file path each time, probably. It should just be when size is too big |
The specific issue of small files is fixed |
@bennyrubanov I've made some updates to this script. There is remaining work before this works out of the box but I'd prefer to merge then come back with followups. Specifically:
|
…size and file size limit, as well as error catch for missing files
56fb2fc
to
1905908
Compare
No description provided.