Skip to content

C++ 中文周刊 2024-08-31 第167期

Compare
Choose a tag to compare
@wanghenshui wanghenshui released this 01 Sep 05:51
· 19 commits to dev since this release

本期文章由 HNY 赞助


资讯

标准委员会动态/ide/编译器信息放在这里

编译器信息最新动态推荐关注hellogcc公众号 本周更新 2024-08-21 第268期

文章

C++20 Modules 在阿里云的大规模应用

看一乐

MySQL 编译(打包

看一乐 感谢恒星投稿

Visualizing boost::unordered_map in GDB, with pretty-printer customization points

给boost unordered实现gdb pretty print

gdb使用pretty print很简单

第一步

(gdb) set print pretty on

第二步,如果你的脚本在二进制的section中

(gdb) add-auto-load-safe-path path/to/executable

如果没有,有脚本,可以加载脚本

(gdb) source path/to/boost/libs/unordered/extra/boost_unordered_printers.py

其实脚本内容和放进二进制section内容是一样的,怎么放进二进制?可以学习这个 https://github.com/boostorg/outcome/blob/master/include/boost/outcome/outcome_gdb.h

我相信大部分读者是第一次知道gdb printer可以放到二进制里的

接下来是如何实现gdb printer

很简单,接口就这样

class BoostUnorderedFcaPrinter:
    def __init__(self, val):
        self.val = val
    
    def to_string(self):
        return f"This is a {self.val.type}"

目标,实现to_string

注册也非常简单

def boost_unordered_build_pretty_printer():
    pp = gdb.printing.RegexpCollectionPrettyPrinter("boost_unordered")
    add_template_printer = lambda name, printer: pp.add_printer(name, f"^{name}<.*>$", printer)

    add_template_printer("boost::unordered::unordered_map", BoostUnorderedFcaPrinter)
    add_template_printer("boost::unordered::unordered_multimap", BoostUnorderedFcaPrinter)
    add_template_printer("boost::unordered::unordered_set", BoostUnorderedFcaPrinter)
    add_template_printer("boost::unordered::unordered_multiset", BoostUnorderedFcaPrinter)
    return pp

gdb.printing.register_pretty_printer(gdb.current_objfile(), boost_unordered_build_pretty_printer())

继续,咱们展开成员

def maybe_unwrap_foa_element(e):
    element_type = "boost::unordered::detail::foa::element_type<"
    if f"{e.type.strip_typedefs()}".startswith(element_type):
        return e["p"]
    else:
        return e

简单吧,现在你学会了gdb.Value.type.strip_typedefs

咱们进化一下

class BoostUnorderedFcaPrinter:
    def __init__(self, val):
        self.val = val
        self.name = f"{self.val.type.strip_typedefs()}".split("<")[0]
        self.name = self.name.replace("boost::unordered::", "boost::")
        self.is_map = self.name.endswith("map")

    def to_string(self):
        size = self.val["table_"]["size_"]
        return f"{self.name} with {size} elements"

这样打印

(gdb) print my_unordered_map
$1 = boost::unordered_map with 3 elements
(gdb) print my_unordered_multiset
$2 = boost::unordered_multiset with 5 elements

考虑遍历成员

def display_hint(self):
    return "map"

def children(self):
    def generator():
        # ...
        while condition:
            value = # ...
            if self.is_map:
                first = value["first"]
                second = value["second"]
                yield "", first
                yield "", second
            else:
                yield "", count
                yield "", value
    return generator()

后面就不展开了

完整代码

python api可以看这里

gdb python的玩法还是非常多的

PS: 吴乎提问: 放进二进制section这个,strip时不知道是跟着debug符号走,还是跟着binary走

笔者查了一下,使用的section debug_gdb_scripts也是debug symbol,strip会被删,
这种建议先strip后objcopy把debug_gdb_scripts搞回来 详情点击这个SO
gdb section可以看这里

SIMD Matters

图形生成算法使用simd性能提升显著。代码就不贴了

Honey, I shrunk {fmt}: bringing binary size to 14k and ditching the C++ runtime

介绍了fmtlib在减小二进制上的探索,比如查表优化

auto do_count_digits(uint32_t n) -> int {
// An optimization by Kendall Willets from https://bit.ly/3uOIQrB.
// This increments the upper 32 bits (log10(T) - 1) when >= T is added.
#  define FMT_INC(T) (((sizeof(#T) - 1ull) << 32) - T)
  static constexpr uint64_t table[] = {
      FMT_INC(0),          FMT_INC(0),          FMT_INC(0),           // 8
      FMT_INC(10),         FMT_INC(10),         FMT_INC(10),          // 64
      FMT_INC(100),        FMT_INC(100),        FMT_INC(100),         // 512
      FMT_INC(1000),       FMT_INC(1000),       FMT_INC(1000),        // 4096
      FMT_INC(10000),      FMT_INC(10000),      FMT_INC(10000),       // 32k
      FMT_INC(100000),     FMT_INC(100000),     FMT_INC(100000),      // 256k
      FMT_INC(1000000),    FMT_INC(1000000),    FMT_INC(1000000),     // 2048k
      FMT_INC(10000000),   FMT_INC(10000000),   FMT_INC(10000000),    // 16M
      FMT_INC(100000000),  FMT_INC(100000000),  FMT_INC(100000000),   // 128M
      FMT_INC(1000000000), FMT_INC(1000000000), FMT_INC(1000000000),  // 1024M
      FMT_INC(1000000000), FMT_INC(1000000000)                        // 4B
  };
  auto inc = table[__builtin_clz(n | 1) ^ 31];
  return static_cast<int>((n + inc) >> 32);
}


template <typename T> constexpr auto count_digits_fallback(T n) -> int {
  int count = 1;
  for (;;) {
    // Integer division is slow so do it for a group of four digits instead
    // of for every digit. The idea comes from the talk by Alexandrescu
    // "Three Optimization Tips for C++". See speed-test for a comparison.
    if (n < 10) return count;
    if (n < 100) return count + 1;
    if (n < 1000) return count + 2;
    if (n < 10000) return count + 3;
    n /= 10000u;
    count += 4;
  }
}

这两种用法,查表就是要多一堆符号的,如果业务要小二进制,就不用查表

另外就是 -nodefaultlibs -fno-exceptions 这种场景下 new delete基本也得去掉 最后减小到14K,如果去除main6k fmt整体小于10k

Faster random integer generation with batching

批量生成随机数相当于给一堆数打散,第一反应就是shuffle

void shuffle(mytype *storage, uint64_t size) {
  for (uint64_t i = size; i > 1; i--) {
    uint64_t nextpos = random(i); // random value in [0,i)
    std::swap(storage[i - 1], storage[nextpos]);
  }
}

显然这个shuffle中的random是瓶颈,我们常规的实现就是%i,有没有更快的做法?

uint64_t random_bounded(uint64_t range) {
  __uint128_t random64bit, multiresult;
  uint64_t leftover;
  uint64_t threshold;
  random64bit = rng(); // 64-bit random integer
  multiresult = random64bit * range;
  leftover = (uint64_t)multiresult;
  if (leftover < range) {
    threshold = -range % range;
    while (leftover < threshold) {
      random64bit = rng();
      multiresult = random64bit * range;
      leftover = (uint64_t)multiresult;
    }
  }
  return (uint64_t)(multiresult >> 64); // [0, range)
}
/ Fisher-Yates shuffle
void shuffle(uint64_t *storage, uint64_t size, uint64_t (*rng)(void)) {
    uint64_t i;
    for (i = size; i > 1; i--) {
        uint64_t nextpos = random_bounded(i, rng);
        uint64_t tmp = storage[i - 1];
        uint64_t val = storage[nextpos];
        storage[i - 1] = val;
        storage[nextpos] = tmp;
    }
}

这实际上也是gcc的实现,我们能不能拆成批量shuffle?

考虑一个场景,你需要多个shuffle,显然每次都执行random_bound代价大

能不能一个random_bound把多个shuffle一起计算?当然可以

然后多个shuffle组合成一个shuffle就是修改index的问题了是不是?

比如拆成两个

// product_bound can be any integer >= range1*range2
// it may be updated to become range1*range2
std::pair<uint64_t, uint64_t> 
 random_bounded_2(uint64_t range1, uint64_t range2,
                   uint64_t &product_bound) {
  __uint128_t random64bit, multiresult;
  uint64_t leftover;
  uint64_t threshold;
  random64bit = rng(); // 64-bit random integer
  multiresult = random64bit * range1;
  leftover = (uint64_t)multiresult;
  uint64_t result1 = (uint64_t)(multiresult >> 64); // [0, range1)
  multiresult = leftover * range2;
  leftover = (uint64_t)multiresult;
  uint64_t result2 = (uint64_t)(multiresult >> 64); // [0, range2)
  if (leftover < product_bound) {
    product_bound = range2 * range1;
    if (leftover < product_bound) {
      threshold = -product_bound % product_bound;
      while (leftover < threshold) {
        random64bit = rng();
        multiresult = random64bit * range1;
        leftover = (uint64_t)multiresult;
        result1 = (uint64_t)(multiresult >> 64); // [0, range1)
        multiresult = leftover * range2;
        leftover = (uint64_t)multiresult;
        result2 = (uint64_t)(multiresult >> 64); // [0, range2)
      }
    }
  }
  return std::make_pair(result1, result2);
}
void shuffle_2(mytype *storage, uint64_t size) {
  uint64_t i = size;
  for (; i > 1 << 30; i--) {
    uint64_t index = random_bounded(i, g); 
    // index is in [0, i-1] 
    std::swap(storage[i - 1], storage[index]);
  }

  // Batches of 2 for sizes up to 2^30 elements
  uint64_t product_bound = i * (i - 1);
  for (; i > 1; i -= 2) {
    auto [index1, index2] = random_bounded_2(i, i - 1, 
         product_bound, g);
    // index1 is in [0, i-1]
    // index2 is in [0, i-2]
    std::swap(storage[i - 1], storage[index1]);
    std::swap(storage[i - 2], storage[index2]);
  }
}

测试linux gcc快30%

Parsing tiny and very large floating-point values: a programming-language comparison

处理无限大无限小,各种语言的差别

python

>>> float("1e-1000")
0.0
>>> float("1e1000")
inf

golang

package main

import (
    "fmt"
    "strconv"
)

func main() {
    f, err := strconv.ParseFloat("1e-1000", 64)
    fmt.Println(f, err)
    f, err = strconv.ParseFloat("1e1000", 64)
    fmt.Println(f, err)
}
/*
0 
+Inf strconv.ParseFloat: parsing "1e1000": value out of range
*/

c

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
int main(void) {
  const char *p = "1e-1000 1e1000";
  printf("Parsing '%s':\n", p);
  char *end;
  for (double f = strtod(p, &end); p != end; f = strtod(p, &end)) {
    printf("'%.*s' -> ", (int)(end-p), p);
    p = end;
    if (errno == ERANGE){
      printf("range error, got ");
      errno = 0;
   }
   printf("%f\n", f);
 }
}
/*
Parsing '1e-1000 1e1000':
'1e-1000' -> range error, got 0.000000
' 1e1000' -> range error, got inf
*/

上面的行为还算同意,c++呢?

五花八门

#include <cstdio>
#include <charconv>
#include <string>
int main() {
  for(std::string str : {"1e-1000", "1e1000"}) {
   double value = -1;
   printf("parsing %s\n", str.c_str());
   auto r = std::from_chars(str.data(), str.data() + str.size(), value);
   if(r.ec == std::errc::result_out_of_range) { printf("out of range "); }
   printf("%f\n", value);
  }
  return EXIT_SUCCESS;
}

如果是clang直接编译不过,不支持浮点数from_chars,只好用c的或者三方库 absl::from_chars可以测试一下

如果是msvc

parsing 1e-1000
out of range 0.000000
parsing 1e1000
out of range inf

现象和上面相同

gcc使用fast_float库理应现象相同,但是加了点边界判断

parsing 1e-1000
out of range -1.000000
parsing 1e1000
out of range -1.00000

导致区分不出了我操, -1引入歧义了

当然标准也注意到了这一点,要不同,避免歧义

开源项目介绍

  • asteria 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群753302367和作者对线

互动环节

最近周报总结内容太少了,所以更新晚,见谅


上一期 下一期