Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds helper class for unbounded read test. #73

Conversation

prashastia
Copy link
Collaborator

Adds utils.py, containing helper functions for dynamic addition of data during unbounded read test.
Aslo modifies parse_logs.py to use the same for argument input.

/gcbrun

This module is similar to the BigQueryExample. A few changes to count the number of records and log them.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
This test reads a simpleTable.
Shell script and python script to check the number of records read.
comments CODECOV_TOKEN usage.
…ds to different tables required for the e2e tests.
avro.io.DatumWriter(),
self.schema,
)
self.table_type.write_rows(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

)
writer.close()

def transfer_avro_to_bq_table(self, thread_number):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transfer_avro_data_to_bq_table

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or transfer_avro_rows_to_bq_table

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 95 to 97
local_avro_file = self.avro_file_local.replace(
'.', '_' + thread_number + '.'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is being done here?

Copy link
Collaborator Author

@prashastia prashastia Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro_file_local has a generic names e.g. "filename.avro". But, we write and upload several
avro files concurrently, to prevent race conditions we write and read via separate
files having names according to the thread numbers. "filename.avro" is changed to
"filename_<thread_number>.avro"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you assign and keep track of "thread numbers"?
Also, as I understand it, thread number is an identifier/suffix and should be named as such. This method (or file in general) does not need to care how the identifier/suffix is derived. Please consider renaming to filename_identifier or filename_suffix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user/we (since this is for testing) can decide the number of threads we wanna deploy.
thread number is simply obtained from the iterator in a loop where we create the threads. In other words thread_number is simply an integer in [0,1,..., n-1] where n is the total number of threads we/user decides to use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

Comment on lines 83 to 84
thread_number: The number of threads to perform the function concurrently
to add the avro rows to.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this description. What does thread_number represent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of threads that concurrently perform the operation. The
operation here refers to generation of records, storing them to avro files,and uploading
them to a BQ table.

file_name = self.avro_file_local.replace('.', '_' + thread_number + '.')
os.remove(file_name)

def create_transfer_records(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a better name here. This is ambiguous.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self.required_arguments = required_arguments
self.acceptable_arguments = acceptable_arguments

def __get_arguments(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to enforce name mangling here. Single underscore prefix (indicating internal use) is sufficient.

) from exc
return argument_dictionary

def __validate_arguments(self, arguments_dictionary):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)


def generate_string():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: generate_random_string is more precise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.



def generate_long():
return random.choice(range(0, 10000000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can also do this instead random.randint(0, 10000000)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing return random.choice(range(0, 10000000)

self.delete_local_file(avro_file_local_identifier)


class ArgumentInputUtils:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but we can also use argparse: https://docs.python.org/3/library/argparse.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Seems more pythonic

@jayehwhyehentee jayehwhyehentee merged commit 45a1bf2 into GoogleCloudDataproc:main Jan 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants