Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA the Output Random Telecom Payments Dataset #5

Open
oislen opened this issue Jun 6, 2023 · 0 comments
Open

QA the Output Random Telecom Payments Dataset #5

oislen opened this issue Jun 6, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@oislen
Copy link
Owner

oislen commented Jun 6, 2023

QA Inconsistencies / Suggestions:

  1. Possible Duplication of uids / userids from multiprocessing.
    i. Set uid to be an incremental integer from 1 to 999999999999999 (already implemented).
    ii. Add month and day to date element of userid.
    iii. Ensure userid is a 16 digit number
  2. Refine logic for transaction status and error codes:
    i. Set rejected transaction status, and associated error codes, to only occur when card hash is not null.
    ii. Set occurrence probabilities for each rejected transaction error code per underlying rejection rate feature.
    iii. Create rejection rates based on fraudulent users (is not behavioural based).
    iv. Create rejection rates based on high transaction amounts / frequencies (not necessarily realistic)
  3. Set hashes to only contain digits and the letters a to f.
  4. Add time stamps to registration dates and transaction dates; set distribution to be normal like across day time hours (not required).
  5. Add transaction payment method as either np.nan, card, store_wallet, or store_points
    i. Add store_points or store_wallet as a transaction payment method when card hash is null and transaction amount > 0.
    ii. Return np.nan when transaction amount is zero
  6. Transaction date has a normal like distribution; should be uniform
    i. Resample transaction date from registration date till period end date
    ii. Extended transaction end date to account for resampling.
  7. Add a more detailed description of dataset in README i.e number of users, date ranges etc ...
    i. Add link to data dictionary in README
  8. Generate transaction price directly within transaction class object instead of linking from price in application class object.
  9. Set a predefined column ordering for the final output data
  10. Increase total transaction amount / frequency based on high, medium and low identifiers within the user class object (naturally handle by poisson distribution).
  11. Multiprocessing inconsistencies
    i. A single userid being associated with multiple different people, append additional uid values and set userid to be 18 digits long (Can add this a fraudulent feature too)
    ii. Generating different application hashes and device types per multiprocessing batch; constant application hashes and device types should be shared among the multiprocessing batches
    iii. Add addition post processing if running multi-processing due to random duplicates between iterations
    iv. Add unique identify to distinguish between multi-processing batches
  12. Device & IP Hashes
    i. Should device hashes be allowed to be null / np.nan?
    ii. Set transactions with missing ip hashes to rejected with error code E900
  13. Transaction Amounts
    i. Have transaction amounts frequently round to nearest .00, .05, .09, .10, not a random remainder such as .01, .03, .07
    ii. https://www.kaggle.com/datasets/lava18/google-play-store-apps
  14. Device Types
    i. Replace random device type strings with actually phone / laptop device names
    ii. https://www.kaggle.com/datasets/abdurrahman22224/smartphone-new-data/data
  15. Shared Entities
    i. Currently shared entity logic results in one large connected component.
  16. Payment Channels
    i. Payment channels should be linked to application hash, such that each app has a dedicated payment channel e.g. adyen
    ii. Add AppStore payment channel type
  17. User First and Surname
    i. Use HuggingFace / Bedrock Llama instruct model to generate user first and surnames based on country code from csv.
    ii. Create a 1000 generated user names per country code.
  18. Add New Error Codes
    i. https://developer.worldpay.com/products/access/reference/response-codes/scheme-codes
  19. Default Dates to Current Year
    i. Add logic to default date ranges to last year from today
  20. Add App Unittests to GitHub Actions Unittests
    i. Incorporate test_gen_user_trans_data.py into github action unittests, requires reference data not currently stored in repo
  21. Added beartype type checks to functions
    i. https://beartype.readthedocs.io/en/latest/eli5/#tutorial
  22. Docker Environment Settings
    i. Docker Installing python virtual environment from local tmp/requirements.txt replace with repo/requirements.txt
    ii. Use latest ubunutu image, not 20.04
    iii. Add github actions unittest envionrment flag echo "GITHUB_ACTIONS_UNITTEST_FLAG=1" >> $GITHUB_ENV
    iv. Removed python3-pip install from dockerfile, instead call python3 -m pip when installing via requirements.txt
    v. Replace .bat files with .cmd files
  23. Repo Folder Naming Convention
    i. renamed scripts directory to datagenerator

Note: will need to revise and refresh unit test data to reflect above changes.

@oislen oislen self-assigned this Jun 6, 2023
@oislen oislen added the enhancement New feature or request label Jun 6, 2023
@oislen oislen added this to the RandomTeleComData Version 1 milestone Jun 6, 2023
oislen added a commit that referenced this issue Jun 9, 2023
@oislen oislen closed this as completed Jun 11, 2023
@oislen oislen reopened this Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
…s when numeric iso country code is less than 3 digits i.e. belgium and albania
oislen added a commit that referenced this issue Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
oislen added a commit that referenced this issue Jun 14, 2023
…t instead of linking from price in application class object
oislen added a commit that referenced this issue Jun 15, 2023
@oislen oislen closed this as completed Jun 15, 2023
oislen added a commit that referenced this issue Jun 15, 2023
oislen added a commit that referenced this issue Jun 15, 2023
@oislen oislen reopened this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant