-
Notifications
You must be signed in to change notification settings - Fork 663
Proposal: Remove the custom memory allocators and replace with an arena #312
Comments
I'm largely in favor of trading -20% CPU for +100% memory, but maybe a more rational argument can be made with some benchmarks on standard hardware where Gumbo would realistically be used instead of other alternatives? (I don't think embedded systems fall in this category, but I might be wrong) How about taking a regular Mac/PC and a few cloud instance types from EC2/GCE and seeing what becomes the bottleneck first? |
Hard to do this without knowing what people are using Gumbo for. I'd initially designed it thinking it'd mostly be used for command-line tools to refactor or lint HTML, but I've since heard of people using it in GUI E-book readers, high-volume server-side webapps, scripting languages, and more. I've used it personally for some big-data passes over a corpus of web pages, and was planning on using it to manipulate WebComponents server-side but ended up scrapping that project. In some of those use-cases, memory may be more precious than others, eg. if you're using it in a server with 100 requests in flight, then 4M/doc works out to an additional 400M. If you're just writing a command-line tool on a desktop, 4M/doc is negligible. So, in the spirit of gathering data - what are people using Gumbo for, if they don't mind sharing? Server, desktop, command-line, mobile, embedded or other app? Memory-constrained or CPU-constrained? I'm a bit busy right now and probably won't be able to make any changes for a while, but it'll be useful to have the data if it comes time to make trade-offs in the future. |
I use Gumbo in Sanitize, a Ruby library for sanitizing untrusted HTML based on a whitelist. Sanitize is used by many people, but as one example, in my day job at SmugMug, it's used to sanitize HTML fragments that our customers enter into image captions or use to customize their photo sites. I don't have exact numbers to share, but SmugMug feeds a fairly high volume of small HTML fragments through Sanitize and Gumbo on demand and displays the output to users. In general I'd be willing to trade memory for faster parsing, but I'm happy with Gumbo's current performance for our particular use case. |
Got it. An interesting question would then be, how long do the pages need to stay in memory compared to the time spent parsing them? My use case is general web crawling from cloud instances, so we parse as much as we can in parallel, do some analysis on their content and discard them as fast as possible. If 100 concurrent pages can saturate multiple CPUs with 400M of RAM (even the smallest compute-optimized EC2 instance has 3.75GB RAM), that's quite a good ratio, but indeed it remains to be tested. |
I've got a partial patch in #309 to replace Gumbo's custom memory allocators with an arena stored in the GumboOutput structure. Seeking input on situations where this may have unintended consequences.
Concrete proposal
Current workaround
Currently Gumbo allows the usage of custom allocators (including arenas), but defaults to a simple wrapper around malloc. It's possible and quite easy to implement an arena with Gumbo, but design trade-offs in how Gumbo allocates memory differ based on what memory allocator is used. As a result, users of the default allocator get reduced performance to support the possibility of an arena, while users of an arena get increased memory traffic to support the system malloc default. It may be simpler for users to remove the choice and instead provide good performance out-of-the-box with an arena.
Benefits
Drawbacks
Compromise solutions
Comment with a +1 or -1, or any additional comments or considerations.
The text was updated successfully, but these errors were encountered: