Jul 19, 2025
Developer Deep Dive
It all started with our image processing service (here at brand.dev) mysteriously gobbling up memory. Occasionally a large burst of traffic involving uploads or resize seemed to make our Node.js process’s memory spike up suddenly which caused our machine to die, fail the request, then reboot.
At first glance, it looked like a memory leak – after processing a batch of images, the memory of the process would balloon and stay bloated. We were confused and slightly pissed. Was the Sharp library (our go-to Node.js image processing tool) leaking buffers? Or was our code not cleaning up something?
Night after night, we combed through logs and heap snapshots. Strangely, V8 heap usage stayed within reasonable limits, yet overall memory kept rising. This was our first clue that something more was at play. The usual suspects – unreferenced Buffers, promises not resolving, etc didn’t pan out.
We tried everything: disabling Sharp’s cache, limiting Sharp’s concurrency, even manually forcing garbage collection (out of sheer desperation). Nothing seemed to fully stop the upward memory creep. It was as if memory was “stuck” somewhere outside of Node’s typical garbage-collected heap.
We scoured GitHub issues and even were desperate enough to check Stack Overflow (who even does that these days with ChatGPT around). Even our attempts at consulting AI gave us generic “maybe you have a leak” answers. It felt like shouting into the void. Meanwhile, the brand.dev service would happily handle ~100 images and then slow to a crawl and die. Restarting the server instance was the only way to reclaim RAM – a bandaid, not a real fix.
Little did we know, we were about to discover that what looked like a leak was actually something else entirely. The breakthrough came after diving into a rabbit hole of GitHub discussions (1, 2, and 3) (issue after issue on the Sharp repository) and finding fellow developers with the exact same pattern. Their conclusion turned our understanding on its head: it wasn’t a “leak” at all – it was memory fragmentation
Fragmentation: The Hidden Culprit
Reading through a GitHub issue, one comment struck a chord: “The issue is not a memory leak, it’s memory fragmentation. This isn’t a bug with the library, but rather comes with the territory of multithreading.” Suddenly, things made sense. In a nutshell, memory fragmentation occurs when lots of small allocations and deallocations in a multi-threaded process leave the memory in many tiny pieces (fragments). The operating system can’t easily reuse or free those pieces back to the OS, even if the process isn’t actively using them, leading to high RSS memory that doesn’t shrink – it looks like a leak, but it isn’t
Why was this happening with Sharp? The Sharp library uses an underlying C++ image processing library (libvips) which is highly multi-threaded for performance. On typical Linux systems (Debian/Ubuntu, etc.), the default memory allocator (glibc’s malloc
) doesn’t handle this pattern of lots of threaded small allocations very well. glibc’s allocator is known to be “unsuitable for long-running, multi-threaded processes that involve lots of small memory allocations” – exactly our scenario. Over time, the heap gets fragmented: memory is freed back to the allocator but not returned to the OS, so RSS stays high.
n fact, the Sharp maintainers were well aware of this. We discovered in the documentation that on Linux, Sharp will actually limit its thread usage if it detects the glibc allocator, specifically to mitigate fragmentation issues. (By default, Sharp uses one thread per CPU core for an operation – great for speed, but on glibc this can fragment memory quickly. So, on Linux without special tuning, Sharp falls back to using only 1 thread per image by default to reduce fragmentation. This was eye-opening: our memory woes were a known side effect of the platform’s memory allocator interacting with Sharp’s multithreading, not a straightforward bug.
Fighting Fragmentation with Jemalloc
The turning point of our journey was learning about jemalloc. Jemalloc is an alternative memory allocator known for its excellent fragmentation avoidance and multi-threaded performance. Other developers had reported dramatic improvements by using jemalloc in Node processes facing similar issues. In one case (in a Socket.io server), simply switching Node to use jemalloc halved the app’s memory usage.
We decided to give it a try. How do you use jemalloc with Node.js? It turns out to be simpler than we thought:
Install jemalloc on your system. (On Ubuntu/Debian, e.g.,
apt-get install libjemalloc2
will provide the library.)Preload the jemalloc library when starting Node. This can be done by setting the
LD_PRELOAD
environment variable to the path of jemalloc’s.so
file before launching the Node process.
On our Ubuntu server, we did exactly this. For example, in a shell we could launch our service with:
If you’re on Heroku or a platform like it, you can’t directly run LD_PRELOAD
, but there are buildpacks (e.g. heroku-buildpack-jemalloc) that make it a cinch. We added the jemalloc buildpack to our Heroku app, which effectively did the same thing under the hood for our dyno.
The results were immediate and jaw-dropping. Memory usage stopped ballooning and in fact dropped to a fraction of what it was before under load. As one Stack Overflow answer succinctly noted, Sharp’s own docs recommend jemalloc for this reason and now we understood why. With jemalloc handling memory, our process could churn through hundreds of images and maintain a stable memory profile. No more overnight memory bloat! 🎉
When Parallel Processing Backfires
With jemalloc in place, we solved the fragmentation puzzle. But we weren’t out of the woods yet. Another lesson we learned the hard way is that throwing too many images at Sharp in parallel can blow up memory usage (and sometimes even degrade performance). Early on, our code tried to process a bunch of images simultaneously – e.g., using Promise.all()
on an array of Sharp operations – in hopes of speed. What we observed, though, was that memory usage during those spikes was enormous and occasionally led to out-of-memory crashes if the input images were large.
Note: In our use case, we need to do a bunch of crops on a single image. When processing 6 images, we need to do about 10 crops per image. So every single image has a factor of 10. Simply adding another image that we need to process in parallel really adds 10 + x. It's a lot more complicated than this, but I'm just sharing for the sake of noting it.
It turns out there are a couple of factors at play here:
Node’s thread pool limit: Each Sharp operation (like
.resize().toBuffer()
or.toFile()
) is offloaded to libuv’s thread pool. By default Node allows 4 threads for these async operations. So even if you trigger, say, 10 Sharp operations at once, Node will actually only run 4 at a time under the hood (unless you increase theUV_THREADPOOL_SIZE
). We initially weren’t aware of this, so our “parallel” tasks weren’t as parallel as we thought. But on a machine with many cores, you might bump that threadpool size up to match your CPU count – which means you could actually be doing, say, 8 or 16 Sharp operations truly concurrently. That’s a lot of heavy lifting happening at once.Sharp’s internal concurrency: As mentioned, Sharp (libvips) itself uses multiple worker threads per image operation, up to the number of CPU cores by default. This is great for speed on big images – an 8-core machine can chew through one image roughly 8x faster than a single-core. But if you multiply that by several images in flight, you can see how the resource usage multiplies. E.g., 4 parallel Sharp tasks on an 8-core machine could spawn up to 32 threads in native code! Each of those threads will allocate memory for image data, etc. If you’re not careful, you can overwhelm your CPU or memory bandwidth, leading to thrashing.
Our experience taught us that more parallelism is not always better. In fact, we got more predictable and stable performance by processing images more sequentially (or in smaller batches), especially under heavy load. For example, instead of doing 10 image transforms at once, doing them one by one (or 2-3 at a time) smoothed out memory spikes. Yes, it might slow overall throughput slightly, but in a web service environment, avoiding a crash or OOM error is well worth the slight latency hit of queuing work.
If your use case allows, try to limit the number of Sharp operations happening simultaneously. In practice this could mean using a job queue or simply awaiting each operation in a loop instead of Promise.all
. By doing so, at most one image is fully in memory at a time (plus whatever libvips’ threads need), rather than N images. It’s a simple way to cap peak memory usage.
On the flip side, if you truly need high throughput and concurrency (e.g. a thumbnail server handling many requests in parallel), consider scaling up your hardware – more cores and more RAM. Sharp will automatically make use of additional CPU cores to speed up each image task. In our tests on a beefier machine, Sharp chewed through images much faster, and we could handle more parallel requests before hitting limits. Just remember that more cores = more threads = potentially more memory, so monitor accordingly. There’s a balance to strike based on your app’s needs: either go sequential on a smaller machine or go parallel on a bigger machine (or a cluster of machines).
Streams over Buffers: Stream Your Images, Save Your Memory
Another major improvement in our Node image pipeline came from switching from buffer-based processing to stream-based processing. Initially, our code was doing something like:
This works, but think about what it means: the entire input image is read into memory (Sharp will decode it internally), then the entire output image is stored in the data
Buffer. For large images, that’s a lot of memory held all at once. Doing this for many images in parallel makes it even worse.
The solution is to take advantage of Node streams. Sharp can operate as a transform stream, which means you can pipe data through it and it will output processed data in a streaming fashion without ever needing to hold the entire image in Node.js memory at once. For example:
In this streaming approach, Node is handling chunks of data as they flow from the file, through Sharp, to the output. At no point does Node.js need to load the entire image into a single Buffer. Sharp’s internal libvips might allocate some buffers for processing, but as soon as each chunk is processed and written out, memory can be reclaimed. This dramatically reduces peak memory usage, especially for large files or high volumes. As one article noted, streaming “helps reduce the memory footprint, which is essential for large media files.
Tuning Sharp: Cache and Other Tweaks
With the big-ticket fixes in place (jemalloc, controlled parallelism, streaming), we also looked at smaller tuning options to squeeze out any remaining memory optimizations. Sharp provides a few global settings that are worth knowing:
Disable Sharp’s cache: Sharp/libvips maintains an internal operation cache by default, to reuse results of recent operations (this can speed up repeated operations on the same image data). By default this cache can use up to 50 MB of memory and hold up to 100 items. In a long-running server that processes many unrelated images, this default cache might just add memory overhead without much benefit (since you’re not re-processing the same image data repeatedly in most cases). We decided to turn it off:
sharp.cache(false)
. This freed up any cached buffers after each operation. Unless you know your workload benefits from Sharp’s caching, disabling it for one-off image tasks is a good practice to avoid extra memory usageSet Sharp concurrency: By default, Sharp’s concurrency (the number of threads libvips uses per image) is as many as there are CPU cores (or 1 on glibc Linux without jemalloc, as discussed). You can manually adjust this via
sharp.concurrency(n)
. In our early attempts to mitigate memory problems, we triedsharp.concurrency(1)
to force single-threaded processing. This can indeed reduce memory usage (since libvips won’t be doing things in parallel internally), but it also can severely hurt performance for large images (only one core used). After moving to jemalloc, we actually leftsharp.concurrency
at its default (so we could use all cores). But if you must run Sharp without jemalloc for some reason, explicitly settingsharp.concurrency(1)
is one way to limit fragmentation risk (essentially what Sharp itself does by default now on such systems). Think of this as a fallback – use it if you can’t use jemalloc or you find Sharp is still using too many threads for your comfort.Other Sharp options: We discovered a few other useful knobs. For example,
sharp.simd(true)
ensures SIMD optimizations are on (they usually are by default if supported, giving a performance boost). There’s alsosharp.limitInputPixels(width*height)
to refuse processing ridiculously large images (to avoid accidental DoS by huge input). And as of newer versions,sharp.options({ maximumMemory: <bytes> })
can set an upper bound on memory Sharp’s internal allocations (useful in constrained environments). We set a reasonablesharp.limitInputPixels
in our service to prevent any astronomical image from ever being processed. We didn’t need to usemaximumMemory
in production, but it’s good to know it exists if you want to enforce an upper limit.
Finally, it’s worth acknowledging the community. The solution to our Sharp memory woes wasn’t obvious, and we likely wouldn’t have solved it alone. It was the collective knowledge from GitHub issues, maintainers’ comments, and fellow developers on forums that pointed us in the right direction.
Sources:
Sharp documentation – Installation and performance notes on memory allocators (glibc vs jemalloc).
Sharp documentation – API for cache and concurrency settings
Stack Overflow – Q&A on Sharp memory usage and jemalloc solution
Pipedream Blog – Example of using Sharp with streams to reduce memory footprint
These github issues on the sharp repo: Link 1, Link 2, Link 3, Link 4