-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use server for concurrency of 1 when writing analysis output & readability factors #54
Changes from 1 commit
cc85e2d
c5aa56d
756c7b6
db4e4be
b4643f3
b0b3eb8
e6b2f81
6f66b8f
d788dc0
976bac9
1905908
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
import { randomUUID } from 'crypto'; | ||
import * as path from 'path'; | ||
|
||
// TODO: This should use type checking | ||
const fs = require('fs'); | ||
|
@@ -31,9 +32,11 @@ async function runAnalysis(filePath: string) { | |
// Run the analysis script | ||
console.log(`Running analysis script on ${filePath}...`); | ||
|
||
const analysisFileBasePath = path.resolve(__dirname, '..', 'src'); | ||
|
||
const child = spawn('ts-node', [ | ||
// '/Users/bennyrubanov/Coding_Projects/chessanalysis/src/index_with_decompressor.ts', | ||
`${__dirname}/../../run_metrics_on_input.ts`, | ||
`${analysisFileBasePath}/run_metrics_on_input.ts`, | ||
filePath, | ||
]); | ||
|
||
|
@@ -82,7 +85,10 @@ const decompressAndAnalyze = async (file, start = 0) => { | |
const filesProduced = new Set(); | ||
|
||
// const base_path = `/Users/bennyrubanov/Coding_Projects/chessanalysis/data/${file.replace( | ||
const base_path = `${__dirname}/../../data/${file.replace('.zst', '')}`; | ||
// base_path used to enumerate where new files should go | ||
const base_path = path.resolve(__dirname, '..', 'data', file.replace('.zst', '')); | ||
// for use in decompressionStreamFromFile | ||
const compressedFilePath = path.resolve(__dirname, '..', 'data'); | ||
|
||
// Create a new file path | ||
const newFilePath = `${base_path}_${randomUUID()}`; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does randomUUID create a random file path number? i prefer counting them. corresponds better with the log here too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the issue with counting is that in high concurrency they will not match up. We could do it by game identifier or institute locking if necessary, but for uniquness I use UUID There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's do both? one for logs, one for avoiding concurrency/overwrites? |
||
|
@@ -108,9 +114,10 @@ const decompressAndAnalyze = async (file, start = 0) => { | |
|
||
// https://www.npmjs.com/package/node-zstandard#decompressionstreamfromfile-inputfile-callback | ||
zstd.decompressionStreamFromFile( | ||
`${__dirname}/../../data/${file}`, | ||
`${compressedFilePath}/${file}`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bennyrubanov fyi these aren't bugs. They are differences between your workflow using ts-node and mine which transpiles to JS first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. understood. can we write code to work with both? |
||
(err, result) => { | ||
if (err) return reject(err); | ||
console.log(`Decompressing file located at ${compressedFilePath}/${file}`); | ||
|
||
let fileLength = 0; | ||
let batch_files_total_decompressed_size = 0; | ||
|
@@ -140,6 +147,7 @@ const decompressAndAnalyze = async (file, start = 0) => { | |
console.log( | ||
`Total number of chunks decompressed so far: ${total_chunk_counter}` | ||
); | ||
|
||
// Increment the file counter | ||
file_counter++; | ||
|
||
|
@@ -210,7 +218,7 @@ const decompressAndAnalyze = async (file, start = 0) => { | |
|
||
// Function to process all files | ||
const processFiles = async (files: string[]) => { | ||
console.log(`Initiating decompression and analysis of ${files}...`); | ||
console.log(`Initiating decompression and analysis of files: ${files}...`); | ||
console.time('Final Total Compressed File Analysis Execution Time'); | ||
for (const file of files) { | ||
await decompressAndAnalyze(file); | ||
|
@@ -232,6 +240,6 @@ module.exports = processFiles; | |
// run if main | ||
if (require.main === module) { | ||
// List of all the database files you want to analyze (these need to be downloaded and in data folder) | ||
const files = ['lichess_db_standard_rated_2013-02.pgn.zst' /*...*/]; | ||
const files = ['lichess_db_standard_rated_2013-01.pgn.zst' /*...*/]; | ||
processFiles(files); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the file was renamed, it's hard for me to see what specific changes you've made here. please test this script before merging into main to make sure it's working properly. I'd prefer above all if you could loom/screen record the running of the script on a smaller dataset, and I can test if it works properly. Or I can run it myself - let me know when it's ready to be run (now?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if i can run this