-
Notifications
You must be signed in to change notification settings - Fork 10
Bugfix/data loading #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR restructures the local setup README, consolidating deployment and container initialization steps, and adds automated data-loading scripts that download, extract, execute SQL files against MySQL with per-file error handling, then clean up temporary artifacts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Script as Data Loader<br/>(loaddummydata.sh/.bat)
participant Remote as Download Service
participant FS as Local File System
participant MySQL as MySQL DB
User->>Script: run script
Script->>Remote: GET AmritMasterData.zip
Remote-->>Script: ZIP stream
Script->>FS: extract ZIP -> AmritMasterDataFiles/
FS-->>Script: extraction complete
loop for each SQL file in SQL_DIR
Script->>FS: read SQL file
Script->>MySQL: execute SQL
alt exit 0
MySQL-->>Script: OK
Script->>Script: log success
else ERROR 1062
MySQL-->>Script: ERROR 1062 (duplicate)
Script->>Script: log duplicate-skip
else other error
MySQL-->>Script: ERROR
Script->>Script: log error + output
end
end
Script->>FS: delete AmritMasterDataFiles/ and ZIP
FS-->>Script: cleanup done
Script-->>User: finished (summary)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
amrit-local-setup/loaddummydata.bat (1)
24-38:ERRORLEVELcheck inside theforblock will not behave as expected.
if %ERRORLEVEL% equ 0 (is evaluated once when the block is parsed, so it won’t reliably reflect the exit code of eachmysqlinvocation inside theforloop. You’ll likely get the same success/failure branch for all files, regardless of individual outcomes.A robust fix is to enable delayed expansion and use
!ERRORLEVEL!:@echo off +setlocal enabledelayedexpansion ... mysql -h %HOST% -P %PORT% -u %USER% -p%PASSWORD% %%B < "%SQL_DIR%\%%A" - - if %ERRORLEVEL% equ 0 ( + + if !ERRORLEVEL! equ 0 ( echo Successfully executed %%A on %%B. ) else ( echo Error executing %%A on %%B. ) ... +endlocal
🧹 Nitpick comments (3)
amrit-local-setup/loaddummydata.sh (1)
30-41: MySQL execution and duplicate-key handling look good; consider optionally surfacing non‑duplicate failures.The pattern of capturing
OUTPUTandEXIT_CODE, treatingEXIT_CODE=0as success andERROR 1062as “duplicates skipped,” is reasonable for idempotent sample-data loads. For non‑duplicate failures, you’re correctly logging the full output.If you ever need this script in automation (CI or pre‑commit), consider:
- Exiting with a non‑zero status in the
elsebranch so calling processes can fail fast.- Optionally printing a brief summary at the end if any files hit non‑duplicate errors.
amrit-local-setup/README.md (2)
11-19: Prerequisites and clone/cdinstructions align with the scripts.Adding “MySQL Client” and explicitly
cd‑ing intoAMRIT-DevOps/amrit-local-setupmatches howloaddummydata.shandloaddummydata.batare written (they assumemysqlis inPATHand run relative to that directory). You might optionally call out thatwgetandunzip(or equivalents) are required on Linux/Unix for the data-loading step, but the current wording is already a solid improvement.
42-42: Tighten wording and capitalization in the container verification note.Minor grammar/style nit; suggest:
-**Note:** Before proceeding, Verify that the Docker containers are running for mysql, redis and mongo +**Note:** Before proceeding, verify that the Docker containers are running for MySQL, Redis and Mongo.This matches typical capitalization and fixes the mid-sentence capital “Verify”. (Based on static analysis hints.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
amrit-local-setup/README.md(2 hunks)amrit-local-setup/loaddummydata.bat(2 hunks)amrit-local-setup/loaddummydata.sh(2 hunks)
🧰 Additional context used
🪛 Gitleaks (8.30.0)
amrit-local-setup/loaddummydata.sh
[high] 10-10: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 11-11: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
amrit-local-setup/loaddummydata.bat
[high] 11-11: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 LanguageTool
amrit-local-setup/README.md
[grammar] ~42-~42: Ensure spelling is correct
Context: ...ainers are running for mysql, redis and mongo ### 2. Database Schema Management Service De...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (1)
amrit-local-setup/loaddummydata.sh (1)
45-47: Cleanup of extracted files is straightforward and appropriate.Removing
AmritMasterDataFilesafter the run keeps the workspace clean and forces a fresh download on the next execution, which is desirable for a reproducible sample-data loader.
| :: Download the file using PowerShell's Invoke-WebRequest (equivalent to wget) | ||
| echo Downloading AmritMasterData.zip... | ||
| powershell -Command "Invoke-WebRequest -Uri 'https://1865391384-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYfDZFIsUuulWkRHaq4c1%2Fuploads%2F1WdSAf0fQBeJOea70EXE%2FAmritMasterData.zip?alt=media&token=18e0b6d6-487c-4c0c-967a-02cdd94d61ad' -OutFile 'AmritMasterData.zip'" | ||
|
|
||
| :: Extract the file using PowerShell's Expand-Archive (equivalent to unzip) | ||
| echo Extracting AmritMasterData.zip... | ||
| powershell -Command "Expand-Archive -Path 'AmritMasterData.zip' -DestinationPath 'AmritMasterDataFiles'" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External download URL embeds a token; consider treating it as a secret and hardening failure handling.
The GitBook URL used in Invoke-WebRequest contains a token= query parameter, which static analysis correctly flags as a potential generic API key. If this token is sensitive or revocable, committing it to the repo risks accidental exposure and future breakage when it’s rotated. Also, neither the download nor the extraction has error checks, so the script will continue even if the file is missing or the network fails.
Consider:
- Moving the URL (or just the token) into a config/env variable, or hosting the ZIP at a stable, unauthenticated location.
- Adding basic checks on
%ERRORLEVEL%afterInvoke-WebRequest/Expand-Archiveand aborting with a clear message on failure.
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 11-11: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In amrit-local-setup/loaddummydata.bat around lines 9 to 16, the script embeds a
tokenized GitBook URL and lacks error handling; remove the hardcoded token by
reading the full URL or only the token from an environment variable or a config
file (do not commit the secret), then after each PowerShell call check
%ERRORLEVEL% (or inspect PowerShell's exit code) and if non-zero echo a clear
error message and exit /b with a non‑zero code so the script stops on download
or extraction failures.
| :: Path to the extracted SQL files | ||
| set SQL_DIR=<PATH to the extracted files directory> | ||
| set SQL_DIR="AmritMasterDataFiles/AmritMasterData" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL_DIR quoting to avoid malformed paths.
set SQL_DIR="AmritMasterDataFiles/AmritMasterData" stores the quotes inside the variable, and later you wrap %SQL_DIR% in another pair of quotes: "%SQL_DIR%\%%A". This yields a doubly quoted, slightly malformed path which can cause file resolution issues.
A safer pattern is:
- set SQL_DIR="AmritMasterDataFiles/AmritMasterData"
+ set "SQL_DIR=AmritMasterDataFiles\AmritMasterData"Then keep the existing usage "%SQL_DIR%\%%A".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| :: Path to the extracted SQL files | |
| set SQL_DIR=<PATH to the extracted files directory> | |
| set SQL_DIR="AmritMasterDataFiles/AmritMasterData" | |
| :: Path to the extracted SQL files | |
| set "SQL_DIR=AmritMasterDataFiles\AmritMasterData" |
🤖 Prompt for AI Agents
In amrit-local-setup/loaddummydata.bat around lines 17-18, the SQL_DIR variable
is assigned with embedded quotes which become part of the variable value and
lead to double-quoting later; change the assignment to remove the quotes (e.g.,
set SQL_DIR=AmritMasterDataFiles\AmritMasterData) so the variable contains a
plain path, then keep using it as "%SQL_DIR%\%%A" elsewhere.
| # Clean up the extracted files | ||
| del AmritMasterData.zip | ||
| rmdir /s /q AmritMasterDataFiles | ||
| echo Cleaned up AmritMasterDataFiles folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch comment syntax: use REM or :: instead of #.
# Clean up the extracted files in a .bat file is executed as a command (and will print an error), not treated as a comment.
Change it to something like:
-# Clean up the extracted files
+:: Clean up the extracted files
del AmritMasterData.zip
rmdir /s /q AmritMasterDataFiles
echo Cleaned up AmritMasterDataFiles folder.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Clean up the extracted files | |
| del AmritMasterData.zip | |
| rmdir /s /q AmritMasterDataFiles | |
| echo Cleaned up AmritMasterDataFiles folder. | |
| :: Clean up the extracted files | |
| del AmritMasterData.zip | |
| rmdir /s /q AmritMasterDataFiles | |
| echo Cleaned up AmritMasterDataFiles folder. |
🤖 Prompt for AI Agents
In amrit-local-setup/loaddummydata.bat around lines 42 to 45, the line starting
with "#" is being treated as a command rather than a comment in a .bat file;
replace the "#" comment with a valid batch comment token (for example prefix the
line with REM or ::) so the cleanup note is a true comment, leaving the del and
rmdir commands unchanged and preserving the final echo.
| # download and extract the SQL files from the given url https://1865391384-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYfDZFIsUuulWkRHaq4c1%2Fuploads%2F1WdSAf0fQBeJOea70EXE%2FAmritMasterData.zip?alt=media&token=18e0b6d6-487c-4c0c-967a-02cdd94d61ad | ||
| wget -O AmritMasterData.zip "https://1865391384-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYfDZFIsUuulWkRHaq4c1%2Fuploads%2F1WdSAf0fQBeJOea70EXE%2FAmritMasterData.zip?alt=media&token=18e0b6d6-487c-4c0c-967a-02cdd94d61ad" | ||
| unzip AmritMasterData.zip -d AmritMasterDataFiles | ||
| rm AmritMasterData.zip | ||
| SQL_DIR="AmritMasterDataFiles/AmritMasterData" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download URL with embedded token may be sensitive and is a single point of failure.
The hard-coded GitBook URL (with token=) for AmritMasterData.zip is flagged by Gitleaks as a generic API key. If the token is private or short‑lived, committing it to the repo both exposes it and makes the script fragile when it’s rotated or expires. Additionally, if wget/unzip fail, the script will still proceed to the MySQL step with an invalid SQL_DIR.
Recommendations:
- Move the URL (or token) into configuration (env var, .env file ignored by Git, or a central config script) or host the ZIP at a stable, unauthenticated location.
- After
wgetandunzip, check the exit status and/or the presence of$SQL_DIRand abort early with a clear message if the data isn’t available.
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 10-10: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 11-11: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In amrit-local-setup/loaddummydata.sh around lines 10 to 15, the script embeds a
GitBook URL with a token and does not validate download/unzip success; move the
URL or token into a configuration source (eg. an environment variable or .env
entry that is gitignored) and reference that variable in the script instead of
the hard-coded string, and after running wget and unzip check their exit codes
and verify the expected SQL_DIR exists (abort with a clear error and non-zero
exit status if any step fails) so the script never proceeds to the MySQL step
with missing data.
| cp `src/main/environment/common_example.properties` to `src/main/environment/common_local.properties` | ||
| mvn clean install -DENV_VAR=local | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the cp command syntax and polish the data-loading sentence.
The line inside the bash block is not a valid shell command as written:
cp `src/main/environment/common_example.properties` to `src/main/environment/common_local.properties`It will be interpreted as command substitution and a literal to, and will fail. Also, the sentence introducing the data-loading step can be made a bit clearer.
Suggested changes:
```bash
git clone https://github.com/PSMRI/AMRIT-DB.git
cd AMRIT-DB
-cp `src/main/environment/common_example.properties` to `src/main/environment/common_local.properties`
-mvn clean install -DENV_VAR=local
+cp src/main/environment/common_example.properties src/main/environment/common_local.properties
+mvn clean install -DENV_VAR=local...
-Run the below script to load sample Data into the database tables created in the previous step
+Run the following script to load sample data into the database tables created in the previous step.
Also applies to: 58-58
<details>
<summary>🤖 Prompt for AI Agents</summary>
In amrit-local-setup/README.md around lines 51-53 and line 58, fix the invalid
cp command and polish the data-loading sentence: replace the backticked and
"to"-containing line with a proper copy command using plain paths (cp
src/main/environment/common_example.properties
src/main/environment/common_local.properties) and ensure the subsequent mvn line
remains unchanged; update the sentence introducing the script to "Run the
following script to load sample data into the database tables created in the
previous step." Keep the corrected lines inside the bash code block.
</details>
<!-- fingerprinting:phantom:poseidon:olive -->
<!-- This is an auto-generated comment by CodeRabbit -->
|
|
||
| :: Download the file using PowerShell's Invoke-WebRequest (equivalent to wget) | ||
| echo Downloading AmritMasterData.zip... | ||
| powershell -Command "Invoke-WebRequest -Uri 'https://1865391384-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYfDZFIsUuulWkRHaq4c1%2Fuploads%2F1WdSAf0fQBeJOea70EXE%2FAmritMasterData.zip?alt=media&token=18e0b6d6-487c-4c0c-967a-02cdd94d61ad' -OutFile 'AmritMasterData.zip'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is that this DB dump can change in the documentation (with the same file name). Also, I am not sure if the token will expire any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid Point. I was thinking about it.
File url will change when new data dump is added to the documentation. We need to update the script as well whenever we update the dump file in gitbook. Token doesnt seem to expire as long as the file name is same
Other option is to maintain the dump file in github rather than gitbook. As the file size is large, we need to use git LFS
Do you have any other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drtechie - End goal I am trying to achieve in future is to have a single script to start database containers, start schema management service and load dummy data. I believe we can have this single script in schema management service git repository itself. Load dummy data can be integrated with schema management service based one some flag
📋 Description
JIRA ID:
Please provide a summary of the change and the motivation behind it. Include relevant context and details.
Enhanced the script which loads sample data to download the sql file, unzip it and run the script. Earlier this was a manual step. Updated the readme accordingly
✅ Type of Change
ℹ️ Additional Information
Please describe how the changes were tested, and include any relevant screenshots, logs, or other information that provides additional context.
Summary by CodeRabbit
Documentation
New Features
✏️ Tip: You can customize this high-level summary in your review settings.