This is an example of a very wide table with a few thousand rows that is sparsely populated and has string values that are shared between columns. ZDW's shared dictionary has a strong impact in this case.
File Size | Filename | Type | Percent of Original |
---|---|---|---|
14,468,990 | analytics-hits.sql | TSV | Original |
1,895,648 | analytics-hits.parquet | Parquet | 13.1% |
1,689,359 | analytics-hits.orc | ORC | 11.7% |
1,303,440 | analytics-hits.zdw | ZDW | 9.0% |
1,159,669 | analytics-hits.snappy.parquet | Parquet+Snappy | 8.0% |
944,057 | analytics-hits.gzip.parquet | Parquet+GZIP | 6.5% |
940,332 | analytics-hits.lzo.orc | ORC+LZO | 6.5% |
937,876 | analytics-hits.snappy.orc | ORC+Snappy | 6.5% |
667,348 | analytics-hits.sql.gz | TSV+GZIP | 4.6% |
631,981 | analytics-hits.zlib.orc | ORC+ZLIB | 4.4% |
398,388 | analytics-hits.sql.xz | TSV+XZ | 2.8% |
312,272 | analytics-hits.zdw.gz | ZDW+GZIP | 2.2% |
227,456 | analytics-hits.zdw.xz | ZDW+XZ | 1.6% |
This is an example of a narrow table where string values are not shared between columns. That is, there are a single-digit number of columns and each column has its own distinct set of values.
File Size | Filename | Type | Percent of Original |
---|---|---|---|
32,653,800 | movie_tickets.sql | TSV | Original |
18,290,691 | movie_tickets.zdw | ZDW | 56.0% |
8,736,936 | movie_tickets.parquet | Parquet | 26.8% |
6,854,324 | movie_tickets.orc | ORC | 21.0% |
5,466,961 | movie_tickets.snappy.parquet | Parquet+Snappy | 16.7% |
4,365,887 | movie_tickets.zdw.gz | ZDW+GZIP | 13.4% |
4,099,301 | movie_tickets.gzip.parquet | Parquet+GZIP | 12.6% |
3,913,324 | movie_tickets.sql.gz | TSV+GZIP | 12.0% |
3,697,046 | movie_tickets.lzo.orc | ORC+LZO | 11.3% |
3,619,216 | movie_tickets.snappy.orc | ORC+Snappy | 11.1% |
2,732,214 | movie_tickets.zlib.orc | ORC+ZLIB | 8.4% |
2,668,936 | movie_tickets.sql.xz | TSV+XZ | 8.2% |
2,040,472 | movie_tickets.zdw.xz | ZDW+XZ | 6.2% |