Skip to content

Ion's table timeout code #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Ion's table timeout code #41

wants to merge 2 commits into from

Conversation

pcmoritz
Copy link
Contributor

No description provided.

table_done_cb done_cb,
table_fail_cb fail_cb,
table_retry_cb retry_cb,
void *user_context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe since there are a lot of arguments here and it is hard for a user to parse, we could separate out the ones that only have to do with failures into their own struct argument (this would probably be retry_count, timeout, fail_cb). We could also have a version of object_table_lookup that specifies default values for these arguments.

table_done_cb done_cb,
table_fail_cb fail_cb,
table_retry_cb retry_cb,
void *user_context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to call retry_cb once before exiting this function? It seems like every time we would want to set a table callback, we would also want to call it once initially.

if (cb_data->retry_count == 0) {
/* We didn't get a response from the database after exhausting all retries;
* let user know, cleanup the state, and remove the timer. */
cb_data->fail_cb(cb_data->id, cb_data->user_context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can make sure that fail_cb won't get called if done_cb has already been called? Also, a way to make sure that if we call done_cb, the callback in outstanding_callbacks will get cleaned up by destroy_table_callback.

We should probably also test for it. :) I think you test whether fail_cb will get called for the object table right now, but not the others. It would be good to also test whether outstanding_callbacks gets updated the way we think it is.

if (((bool *)cb_data->requests_info)[PUBLISH_INDEX] == true) {
if (cb_data->done_cb) {
task_log_done_cb done_cb = cb_data->done_cb;
done_cb(cb_data->id, cb_data->user_context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the timer here to prevent the failure callback from being called (see comment in state/table.c)

if (((bool *)cb_data->requests_info)[PUSH_INDEX] == true) {
if (cb_data->done_cb) {
task_log_done_cb done_cb = cb_data->done_cb;
done_cb(cb_data->id, cb_data->user_context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the timer here to prevent the failure callback from being called (see comment in state/table.c)

outstanding_callback *outstanding_callbacks_find(table_callback_data *key) {
outstanding_callback *outstanding_cb = NULL;

HASH_FIND_PTR(outstanding_cbs, &key, outstanding_cb);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we malloc something later on that gets allocated the same pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you pointed out, one solution is to use the timer_id to distinguish (it is unique)

object_table_lookup_done_cb done_cb = cb_data->done_cb;
done_cb(cb_data->id, manager_count, manager_vector, cb_data->user_context);
/* remove timer */
event_loop_remove_timer(cb_data->db_handle->loop, cb_data->timer_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the callback in outstanding_callbacks will get cleaned up.

event_loop_stop(loop);
}

TEST timeout_test(void) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation about what each test is testing for?

db_handle *db = db_connect("127.0.0.1", 6379, "plasma_manager", "127.0.0.1", 11235);
db_attach(db, loop);
void *context = (void *) 42;
object_table_lookup(db, NIL_ID, 5, 100, retry_done_cb, retry_fail_cb, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think this is really only a test for the object table timeout, not generic timeouts. This goes back to the other comment in state/table.c about making sure we clean up everything properly if done_cb gets called. Currently, the tables seem to do different things once they've called done_cb.

}


SUITE(db_timeout_tests) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check in all cases that outstanding_cbs is cleaned up properly.

@istoica istoica force-pushed the object-table-retry branch from a1c1b13 to 92619c1 Compare October 21, 2016 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants