-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QA the Output Random Telecom Payments Dataset #5
Comments
oislen
added a commit
that referenced
this issue
Jun 10, 2023
oislen
added a commit
that referenced
this issue
Jun 10, 2023
oislen
added a commit
that referenced
this issue
Jun 14, 2023
oislen
added a commit
that referenced
this issue
Jun 14, 2023
oislen
added a commit
that referenced
this issue
Jun 14, 2023
…s when numeric iso country code is less than 3 digits i.e. belgium and albania
oislen
added a commit
that referenced
this issue
Jun 14, 2023
…ional categories card, wallet and points
oislen
added a commit
that referenced
this issue
Jun 14, 2023
…t instead of linking from price in application class object
oislen
added a commit
that referenced
this issue
Jun 14, 2023
oislen
added a commit
that referenced
this issue
Jun 14, 2023
oislen
added a commit
that referenced
this issue
Jun 14, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Jun 15, 2023
oislen
added a commit
that referenced
this issue
Sep 17, 2024
oislen
added a commit
that referenced
this issue
Sep 17, 2024
oislen
added a commit
that referenced
this issue
Sep 17, 2024
oislen
added a commit
that referenced
this issue
Sep 17, 2024
oislen
added a commit
that referenced
this issue
Sep 17, 2024
oislen
added a commit
that referenced
this issue
Sep 24, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
QA Inconsistencies / Suggestions:
Possible Duplication of uids / userids from multiprocessing.i.
Set uid to be an incremental integer from 1 to 999999999999999 (already implemented).ii.
Add month and day to date element of userid.iii.
Ensure userid is a 16 digit numberRefine logic for transaction status and error codes:i.
Set rejected transaction status, and associated error codes, to only occur when card hash is not null.ii.
Set occurrence probabilities for each rejected transaction error code per underlying rejection rate feature.iii.
Create rejection rates based on fraudulent users (is not behavioural based).iv.
Create rejection rates based on high transaction amounts / frequencies (not necessarily realistic)Set hashes to only contain digits and the letters a to f.Add time stamps to registration dates and transaction dates; set distribution to be normal like across day time hours (not required).Add transaction payment method as eithernp.nan
,card
,store_wallet
, orstore_points
i.
Addstore_points
orstore_wallet
as a transaction payment method when card hash is null and transaction amount > 0.ii.
Returnnp.nan
when transaction amount is zeroTransaction date has a normal like distribution; should be uniformi.
Resample transaction date from registration date till period end dateii.
Extended transaction end date to account for resampling.Add a more detailed description of dataset in README i.e number of users, date ranges etc ...i.
Add link to data dictionary in READMEGenerate transaction price directly within transaction class object instead of linking from price in application class object.Set a predefined column ordering for the final output dataIncrease total transaction amount / frequency based on high, medium and low identifiers within the user class object (naturally handle by poisson distribution).i. A single userid being associated with multiple different people, append additional uid values and set userid to be 18 digits long (Can add this a fraudulent feature too)
ii. Generating different application hashes and device types per multiprocessing batch; constant application hashes and device types should be shared among the multiprocessing batches
iii. Add addition post processing if running multi-processing due to random duplicates between iterations
iv.
Add unique identify to distinguish between multi-processing batchesDevice & IP Hashesi.
Should device hashes be allowed to be null / np.nan?ii.
Set transactions with missing ip hashes to rejected with error code E900Transaction Amountsi.
Have transaction amounts frequently round to nearest .00, .05, .09, .10, not a random remainder such as .01, .03, .07ii.
https://www.kaggle.com/datasets/lava18/google-play-store-appsDevice Typesi.
Replace random device type strings with actually phone / laptop device namesii.
https://www.kaggle.com/datasets/abdurrahman22224/smartphone-new-data/dataShared Entitiesi.
Currently shared entity logic results in one large connected component.Payment Channelsi.
Payment channels should be linked to application hash, such that each app has a dedicated payment channel e.g. adyenii.
Add AppStore payment channel typei. Use HuggingFace / Bedrock Llama instruct model to generate user first and surnames based on country code from csv.
ii. Create a 1000 generated user names per country code.
i. https://developer.worldpay.com/products/access/reference/response-codes/scheme-codes
i. Add logic to default date ranges to last year from today
i. Incorporate
test_gen_user_trans_data.py
into github action unittests, requires reference data not currently stored in repoi. https://beartype.readthedocs.io/en/latest/eli5/#tutorial
i. Docker Installing python virtual environment from local tmp/requirements.txt replace with repo/requirements.txt
ii. Use latest ubunutu image, not 20.04
iii. Add github actions unittest envionrment flag
echo "GITHUB_ACTIONS_UNITTEST_FLAG=1" >> $GITHUB_ENV
iv. Removed
python3-pip
install fromdockerfile
, instead callpython3 -m pip
when installing viarequirements.txt
v. Replace
.bat
files with.cmd
filesi. renamed
scripts
directory todatagenerator
Note: will need to revise and refresh unit test data to reflect above changes.
The text was updated successfully, but these errors were encountered: