Skip to content

Some reads not captured on Linux #45

@hombit

Description

@hombit

Bug report

I have a 1 GiB file, and I'm getting different results when I read it with standard Python tooling and pyarrow; pyarrow bytes read are unrealistically small.

with open('train-00000-of-00007.parquet', 'rb') as gh:
     %iops data = gh.read()
del data
======================================================================
IOPS Profile Results (strace (per-process))
======================================================================
Execution Time:                18.2150 seconds
Read Operations:               2
Write Operations:              0
Total Operations:              2
Bytes Read:                    1.02 GB (1,091,305,162 bytes)
Bytes Written:                 0.00 B (0 bytes)
Total Bytes:                   1.02 GB (1,091,305,162 bytes)
----------------------------------------------------------------------
IOPS:                          0.11 operations/second
Throughput:                    57.14 MB/second
======================================================================
import pyarrow.parquet as pq
%iops pq.read_table('train-00000-of-00007.parquet')
======================================================================
IOPS Profile Results (strace (per-process))
======================================================================
Execution Time:                19.7621 seconds
Read Operations:               3
Write Operations:              3
Total Operations:              6
Bytes Read:                    3.63 MB (3,808,731 bytes)
Bytes Written:                 13.05 KB (13,360 bytes)
Total Bytes:                   3.65 MB (3,822,091 bytes)
----------------------------------------------------------------------
IOPS:                          0.30 operations/second
Throughput:                    188.87 KB/second
======================================================================

I tried to do sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches', but it didn't help.

Environment Information

Linux 6.8.0, x86_64, etx4, python 3.13, pyarrow 23, iops_profiler 0.2.0, ipython 9.9.0

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, and any applicable data others will need to reproduce the problem.
  • I have included information about my environment, including the version of this package (e.g. iops_profiler.__version__)
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions