Skip to content

Conversation

@Zhicheng-Liu
Copy link

…ndices

Previously when combining multiple classic indices into a single classic
index, the contents of source indices are read in as char. During the
interleaving process, depending on the current position of the destination
index, both left and right shifts on the next char could be performed.

However, there are a few undefined behaviours that could affect the results
depending on the platform:

  1. The signedness of a char is an undefined behaviour. Hence when
    bit shifting, the usual arithmetic conversion performed on the char is
    undefined. The char could be promoted to either signed int or unsigned
    int.
  2. If the char is treated as signed int, the bit shifting (both left and
    right) is also undefined in pre-c++20 standards. The behaviour is platform
    dependent.

This change fixes the issue by declare the contents read from source
indices as unsigned char.

giang-nghg and others added 7 commits March 30, 2021 22:13
* Expose classic_combine function

* Add exposition code

* Fix syntax error

* Expose calc_signature_size
* Expose classic_combine function

* Add exposition code

* Fix syntax error

* Expose calc_signature_size

* Expose classic combine as cmd instead
…ndices

Previously when combining multiple classic indices into a single classic
index, the contents of source indices are read in as `char`. During the
interleaving process, depending on the current position of the destination
index, both left and right shifts on the next char could be performed.

However, there are a few undefined behaviours that could affect the results
depending on the platform:
1. The signedness of a `char` is an undefined behaviour. Hence when
bit shifting, the usual arithmetic conversion performed on the char is
undefined. The char could be promoted to either signed int or unsigned
int.
2. If the char is treated as signed int, the bit shifting (both left and
right) is also undefined in pre-c++20 standards. The behaviour is platform
dependent.

This change fixes the issue by declare the contents read from source
indices as `unsigned char`.
@Zhicheng-Liu
Copy link
Author

@bingmann and @leoisl, could you please review this pull request?

…ingmann#5)

When combining classic indices, for each batch the combinations of
rows from each constituent index are written to an output block.
The output block is reused for next batch.

As we use bitwise OR operation to combine rows from the constituent
indices, the output block should be reset to all 0s before being
reused. Otherwise, previous set bits will be carried over to next
batch and accumulating false positives till the end of the batch
processing loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants