670594 - Visualize the GC heap

Reporter

Description

•

13 years ago

Out of 162.26m of JS memory, my system has 23.6m of gc-heap-chunk-unused.  I'm sure this is better than what it was, but in comparison, jemalloc reports that it's wasting 2.3m (heap-dirty) out of 288.43m (heap-used).

Given how helpful about:memory has been in helping us track down memory issues, I wonder if we should write <bikeshed>about:gc-chunks</bikeshed>, which would give information about what was contained in each gc chunk.  We could use this to improve the allocator (e.g. bug 669245).

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

13 years ago

I'm partway through a patch that will visualize the GC heap -- a <canvas> with on pixel per arena, coloured according to how full it is.  I'm not sure if I want it to end up in production code, but the aim is for it to help with this sort of thing.

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Assignee: general → nnethercote

Summary: Create about:gc-chunks → We need more info about layout of the GC heap

Whiteboard: [MemShrink]

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Blocks: 670596

Summary: We need more info about layout of the GC heap → Visualize the GC heap

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Blocks: MemShrinkTools

Gregor Wagner [:gwagner]

Comment 2

•

13 years ago

What does this gc-heap-chunk-unused number actually mean? Does this count the number of empty arenas or the number of free things in arenas?

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

13 years ago

For every empty arena in a chunk, it adds sizeof(Arena) to the gc-heap-chunk-unused number.

For every empty cell in an arena, it adds thingSize to the <compartment>/gc-heap/arena-unused number.

Justin Lebar (not reading bugmail)

Reporter

Comment 4

•

13 years ago

When I was filing this, I was originally going to suggest making an image. But one of the really good things about about:memory is that it's easy to paste into a bug.

Gregor Wagner [:gwagner]

Comment 5

•

13 years ago

(In reply to comment #3)
> For every empty arena in a chunk, it adds sizeof(Arena) to the
> gc-heap-chunk-unused number.

Makes sense! Do you perform a GC when (re)loading the page?
Could you add the separation between systems and web-content chunks?
We should find out why we have right after startup:
15.89 MB (17.90%) -- gc-heap-chunk-unused

Gregor Wagner [:gwagner]

Comment 6

•

13 years ago

Nick, I looked at your implementation that calculates gc-heap-chunk-unused.
Basically you do 
gcHeapChunkUnused -=
           stats->gcHeapArenaHeaders + stats->gcHeapArenaPadding +
           stats->gcHeapArenaUnused +
           stats->gcHeapObjects + stats->gcHeapStrings +
           stats->gcHeapShapes + stats->gcHeapXml;

I think you are only accounting the arenas but you are missing all the additional information that is part of every chunk:

    Arena           arenas[ArenasPerChunk];
    ChunkBitmap     bitmap;
    MarkingDelay    markingDelay[ArenasPerChunk];
    ChunkInfo       info;

Maybe I just haven't seen this part in your code.

Gregor Wagner [:gwagner]

Comment 7

•

13 years ago

(In reply to comment #6)
> Nick, I looked at your implementation that calculates gc-heap-chunk-unused.
> Basically you do 
> gcHeapChunkUnused -=
>            stats->gcHeapArenaHeaders + stats->gcHeapArenaPadding +
>            stats->gcHeapArenaUnused +
>            stats->gcHeapObjects + stats->gcHeapStrings +
>            stats->gcHeapShapes + stats->gcHeapXml;
> 
> I think you are only accounting the arenas but you are missing all the
> additional information that is part of every chunk:
> 
>     Arena           arenas[ArenasPerChunk];
>     ChunkBitmap     bitmap;
>     MarkingDelay    markingDelay[ArenasPerChunk];
>     ChunkInfo       info;
> 
> Maybe I just haven't seen this part in your code.

Oh I guess you do compensate for the chunk administration stuff. I missed the adjustment at the end.

Nicholas Nethercote [inactive]

Assignee

Comment 8

•

13 years ago

(In reply to comment #4)
> When I was filing this, I was originally going to suggest making an image.
> But one of the really good things about about:memory is that it's easy to
> paste into a bug.

True, but screenshots aren't that hard to make either.  And I text won't give anything like the amount of information an image will in this case.  That's assuming it even ends up in production code.

Justin Lebar (not reading bugmail)

Reporter

Comment 9

•

13 years ago

I guess I was thinking that properly-summarized textual data might be more useful than a picture.  That is, the goal is to provide not as much raw data as possible, but specifically the data we need.  Keeping in mind that I don't actually know how gc arenas work, I was thinking of something like:

 * Chunk 1 (111kb used, 913kb available)
   - Number of arenas: 2
   - Compartments: 3
     * http://google.com (20 objects, 100kb)
     * http://mozilla.com (1 object, 1kb)
     * System compartment (5 objects, 10kb)
   - Arena 1
     * Total used: 10 kb
     * Total unused: 50 kb
     * Compartment http://google.com
       5 objects, 3kb
     * Compartment http://mozilla.org
       1 object, 1kb
   - Arena 2
     ...

You'd be able to see that one chunk has allocations from 42 different compartments, and here's a list.  If this were a (static, because it's a screenshot) graphic, it'd be hard to drill down into exactly which compartments were in one compartment, or tell exactly what changed between two screenshots.

If a chunk has just one small allocation from one compartment something like a tree map [1] would under-emphasize that allocation, while we want to call it out with <blink>.  In other words, text seems better-suited to this because we care not as much about the general trend (chunk X contains mostly allocations from compartment Y, plus some noise) but the specifics (chunk X contains allocations from these 42 different compartments).

> And I text won't give anything like the amount of information an image will 
> in this case.

Maybe you could provide a use-case that an image would serve better than the text example above?

[1] http://code.google.com/apis/chart/interactive/docs/gallery/treemap.html

Nicholas Nethercote [inactive]

Assignee

Comment 10

•

13 years ago

Each chunk has 255 arenas, and you can easily have 100s of chunks.  Multiple lines of text per arena won't scale.

I'm going to finish implementing my idea now.

Andrew McCreight [:mccr8]

Comment 11

•

13 years ago

It might be fun to be able to visualize which compartments appear in which arenas or whatnot.

> Maybe you could provide a use-case that an image would serve better than the text example above?

Images are good for providing a high level overview of a large data set.  In my cycle collector heap visualization work, it was much easier to see some patterns in the data with an image, rather than a 20meg log file.  Once you've come up with some kind of hypothesis, it is nice to have a detailed text file that you can analyze with a script to see how possible solutions may work.

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

13 years ago

> Each chunk has 255 arenas, and you can easily have 100s of chunks.  Multiple 
> lines of text per arena won't scale.

Surely we could pare this down.  From the point of view of chunk fragmentation, do we even care which arenas the allocations are in?  If we do, we could show only the most-fragmented chunks / arenas, etc.

> I'm going to finish implementing my idea now.

I look forward to seeing it!

Justin Lebar (not reading bugmail)

Reporter

Comment 13

•

13 years ago

Also, Waldo and Khuey tell me on IRC that each arena has only one compartment.  So it's not necessary to report on each one...

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Whiteboard: [MemShrink] → [MemShrink:P1]

Nicholas Nethercote [inactive]

Assignee

Comment 14

•

13 years ago

Attached patch patch, v1 (horrid code quality) (obsolete) — Details — Splinter Review

Here's a first cut, don't look at the code, it's horrendous quick and dirty code I wrote just to get something up and running so I could see what it looks like.  Screenshot coming next.

Nicholas Nethercote [inactive]

Assignee

Comment 15

•

13 years ago

Attached image screenshot, v1 — Details

First of all, this bug has epic bikeshedding potential. I don't know how to avoid that. I did try a bunch of different things and some things worked, some didn't, and you really can't tell if something will work until you try it. If you want to experiment yourself, most of the presentation code is in XPConnectJSCompartmentsMultiReporter::CollectReports (please excuse the crappy code quality). Comments along the lines of "I tweaked X and I think it looks better, here's a screenshot" will be much more useful than "hey, d'ya reckon tweaking X would make it look better?". Also, you really have to try it live, see how it changes, because a single screenshot doesn't get that across.

Anyway, here's a description of what you're looking at in this screenshot.

- Each row represents a 1MB chunk. There's a single-pixel black line between each chunk.

- Each 3x3 square represents a 4KB arena. I originally had 1 pixel per arena, but it was too hard to see. 3x3 is much better and gives a good width. (4x4 makes it much wider than the about:memory text, at least on my machine.)

- Unused arenas are coloured white.

- Used arenas are coloured according to their owning compartment. Ie. all arenas that belong to a single compartment have the same colour.

- There are 13 colours in play. If there are more than 13 compartments, colours will be reused. I chose the 13 colours fairly randomly from http://www.colorspire.com/color-names/; the number 13 is arbitrary, and just where I felt I had enough for the moment. Ideally we'd have a large number (eg. 100) colours that are easily distinguishable from each other and look good in arbitrary combinations -- eg. none are too bright or too dark. If that's not possible, we'll need some interactivity to help (see below).

- Arenas that are full have a solid square; arenas that are partially-used have a single white pixel in the centre. I experimented with using darker colours to represent partial use, but it was too hard to distinguish from compartment colour differences. One nice thing about the single white pixel is that if you have a partially-used arena surrounded by unused (white) arenas, you can tell it's partially-used. I don't think there's much value in trying to represent more detail about how full an arena is. (The patch also dumps each arena's fullness to stderr.)

- You can easily see the dividing line between the system chunks (containing blue and purple arenas) and the user chunks (everything else).

- No, there's no interactivity. Yes, it would be nice to click on an arena and see (a) how full it is, (b) all the other arenas in the same compartment, and (c) that compartment's name.

Nicholas Nethercote [inactive]

Assignee

Comment 16

•

13 years ago

Attached image screenshot, without system/user chunk split — Details

I tried turning off Gregor's system/user chunk split (bug 666058).  I ran http://valgrind.org/njn/mem.html which opens and then closes 15 web pages.  I then did "minimize memory usage" a lot (surprisingly so) of times to get down to the base three compartments.  Here's the heap visualization.  Lots of chunks are kept alive by a small number of arenas.

Nicholas Nethercote [inactive]

Assignee

Comment 17

•

13 years ago

Attached image screenshot, with system/user chunk split — Details

Screenshot with the system/user chunk split.  Much better, obviously!

Andrew McCreight [:mccr8]

Comment 18

•

13 years ago

In the second shot, do you know which color is the system?  Kind of funny that the shot with them separated actually has more intermixing of compartments.  I guess the intermixed compartments are all user, though.

Gregor Wagner [:gwagner]

Comment 19

•

13 years ago

(In reply to comment #17)
> Created attachment 545809 [details]
> screenshot, with system/user chunk split
> 
> Screenshot with the system/user chunk split.  Much better, obviously!

Wohoo! Love it!

Nicholas Nethercote [inactive]

Assignee

Comment 20

•

13 years ago

(In reply to comment #18)
> In the second shot, do you know which color is the system?  Kind of funny
> that the shot with them separated actually has more intermixing of
> compartments.  I guess the intermixed compartments are all user, though.

In both cases, blue is the system principal, red is the atoms compartment, and the tiny green one is a moz-nullprincipal compartment (which should arguably be included in the system chunks, but I haven't worked out how to do it neatly).  There are no user compartments because I minimized until they disappeared.

Andrew McCreight [:mccr8]

Comment 21

•

13 years ago

Ah, right, now I understand.  Very cool.

Nicholas Nethercote [inactive]

Assignee

Comment 22

•

13 years ago

Attached image screenshot, long browsing session — Details

FWIW, here's a heap visualization for a 7 hour browsing session.  There are three tabs open:  about:memory, gmail.com (which has been open the entire session), and a BMO bug.

Colour guide:
- red: system principal
- blue: atoms
- purple: gmail
- orange and green: www.google.com/calendar and BMO (not sure which is which)

Some notable numbers from about:memory:

              2 -- js-compartments-system
              4 -- js-compartments-user
  153,092,096 B -- js-gc-heap
   26,575,640 B -- js-gc-heap-arena-unused
   71,001,728 B -- js-gc-heap-chunk-unused
         63.73% -- js-gc-heap-unused-fraction

You can see that the gmail compartment's arenas are spread out a lot.  I don't think smaller chunks (bug 671702) will help much here -- there aren't many chunks that are only 1% full, say, but there are lots that are 5--10% full.
But a compacting GC would make a huge difference.

Nicholas Nethercote [inactive]

Assignee

Comment 23

•

13 years ago

Attached image screenshot, long browsing session after closing all tabs — Details

This is a follow-up to comment 22 -- I closed all tabs except about:memory, and minimized memory usage lots of times over several minutes.  I still couldn't get rid of a landfill.bugzilla.org compartment (the green one), but I'm unable to replicate it's zombie-ness in a fresh session, annoyingly.

Anyway, the usual stats:

   61,865,984 B -- js-gc-heap
    9,584,784 B -- js-gc-heap-arena-unused
   22,471,104 B -- js-gc-heap-chunk-unused
         51.81% -- js-gc-heap-unused-fraction

Bug 668809 looks like it'll be impossible to achieve without a compacting GC.

Nicholas Nethercote [inactive]

Assignee

Comment 24

•

13 years ago

(In reply to comment #23)
> I still
> couldn't get rid of a landfill.bugzilla.org compartment (the green one), but
> I'm unable to replicate it's zombie-ness in a fresh session, annoyingly.

This is weird.  I eventually managed to get it to disappear by opening a tab to google.com, which created a google.com compartment which seemingly replaced the landfill.bugzilla.org one.  Then when I closed the google.com tab the corresponding compartment disappeared.  Sounds like some kind of zombie compartment caused by a piece of state that can be reset, or something.

Gregor Wagner [:gwagner]

Comment 25

•

13 years ago

I was also running the browser with this patch and saw similar results.

I am a little bit worried about the "killing fragmentation after a GC" goal.
In a naive way I could say that our current GC model relies on fragmentation because we don't compact and if we always trigger a GC with an almost full heap we don't have a fragmentation problem.
Maybe we should also start looking at the fragmentation right when we trigger a GC.

Justin Lebar (not reading bugmail)

Reporter

Comment 26

•

13 years ago

> Bug 668809 looks like it'll be impossible to achieve without a compacting GC.

That assumes that we're doing a good job of avoiding fragmentation with our current allocator, which we're not: bug 669245.  (Also, smaller chunk sizes will help, at least somewhat.)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 27

•

13 years ago

Couldn't bug 670596 help here?

Nicholas Nethercote [inactive]

Assignee

Comment 28

•

13 years ago

(In reply to comment #26)
> > Bug 668809 looks like it'll be impossible to achieve without a compacting GC.
> 
> That assumes that we're doing a good job of avoiding fragmentation with our
> current allocator, which we're not: bug 669245.  (Also, smaller chunk sizes
> will help, at least somewhat.)

Better chunk choice will help, but it only takes us so far.  We could have a perfectly unfragmented heap, but the next time we do a GC we'll end up fragmenting it significantly by removing lots of objects.

A generational GC will help, though;  by getting rid of a lot of short-lived objects in the nursery, things that make it to the proper heap will tend to be longer-lived, so we'll end up collecting fewer objects on each major GC, and thus introduce fewer holes.(In reply to comment #27)


> Couldn't bug 670596 help here?

If it can be made to work, yes.

Justin Lebar (not reading bugmail)

Reporter

Comment 29

•

13 years ago

> Better chunk choice will help, but it only takes us so far.  We could have a 
> perfectly unfragmented heap, but the next time we do a GC we'll end up 
> fragmenting it significantly by removing lots of objects.

Of course.  But just to put this in perspective, my heap-allocated / heap-committed ratio is 90%, and jemalloc's job is at least as hard (I'd guess much harder) than the js allocator's.

(I'm not actually sure if heap-allocated / heap-committed is the right ratio, but no matter how you slice it, there's very little waste due to fragmentation in jemalloc.)

David Mandelin [:dmandelin]

Comment 30

•

13 years ago

(In reply to comment #22)
> Created attachment 547615 [details]

Great data here!

>          63.73% -- js-gc-heap-unused-fraction

Wow.

> You can see that the gmail compartment's arenas are spread out a lot.  I
> don't think smaller chunks (bug 671702) will help much here -- there aren't
> many chunks that are only 1% full, say, but there are lots that are 5--10%
> full.

Can you model smaller chunks on this data? I.e., for various values of chunk size, see how many chunks you'd be able to free, and what the unused fraction would end up being?

> But a compacting GC would make a huge difference.

Yes.

Justin Lebar (not reading bugmail)

Reporter

Comment 31

•

13 years ago

> Can you model smaller chunks on this data? I.e., for various values of chunk 
> size, see how many chunks you'd be able to free, and what the unused fraction 
> would end up being?

That's hard to do right without the full log of allocations -- if we moved to a chunk of size X, we could potentially save much more than X times the number of contiguous gaps in the heap of size X.  Also, the chunk choice plays a huge rule here, and it's currently suboptimal.  (bug 669245)

David Mandelin [:dmandelin]

Comment 32

•

13 years ago

(In reply to comment #31)
> > Can you model smaller chunks on this data? I.e., for various values of chunk 
> > size, see how many chunks you'd be able to free, and what the unused fraction 
> > would end up being?
> 
> That's hard to do right without the full log of allocations

I know, but rough experiments are often informative enough.

>  -- if we moved to a chunk of size X, we could potentially save much more than
> X times the number of contiguous gaps in the heap of size X.  

Interesting, how does that work?

(In reply to comment #29)
> > Better chunk choice will help, but it only takes us so far.  We could have a 
> > perfectly unfragmented heap, but the next time we do a GC we'll end up 
> > fragmenting it significantly by removing lots of objects.
> 
> Of course.  But just to put this in perspective, my heap-allocated /
> heap-committed ratio is 90%, and jemalloc's job is at least as hard (I'd
> guess much harder) than the js allocator's.
> 
> (I'm not actually sure if heap-allocated / heap-committed is the right
> ratio, but no matter how you slice it, there's very little waste due to
> fragmentation in jemalloc.)

billm says that might be mostly from decommitting unused pages. That's another question for modeling: on the data sets here, what would our usage ratio be if we decommitted all the unused pages?

David Mandelin [:dmandelin]

Comment 33

•

13 years ago

A couple of top-level thoughts here:

1. Nick says in a couple of places that compacting would really solve the problem, and that generational would help further by not promoting the objects to a mark-sweep heap in the first place. That suggests maybe (key word is "maybe") we should just throw all our effort into getting that done.

It seems like we should be able to get some pretty suggestive data at least on how far small-scoped changes will take us. Specifically, we should be able to get reasonable estimates on what we would get out of (a) chunks of size N, (b) decomitting, and combinations of (a(N)) and (b).

Are there any other simple ideas we should be testing out? Any other big measurements to do?

2. Just for clarification, what exactly is the significance of fragmentation here? It seems to me that in a situation where the browser is running JS programs that roughly continuously allocate memory, with a stable working set of size W, it doesn't matter much, because the allocations will fill in those gaps until we get back to W or so, at which time we GC. Is that right, or would memory usage be lower or observable smaller in some sense we care about if fragmentation were lower?

The big difference seems to be when the working set size drops, e.g., after closing some tabs, or after a JS program reduces its WS for some reason. I guess that's bug 668809. Are there other major scenarios of interest?

Justin Lebar (not reading bugmail)

Reporter

Comment 34

•

13 years ago

>>  -- if we moved to a chunk of size X, we could potentially save much more than
>> X times the number of contiguous gaps in the heap of size X.  
>
> Interesting, how does that work?

I spoke with billm about this on IRC.  To summarize my understanding of how this works:

 * First we allocate gc chunks.  These are currently 1MB.
 * Then we allocate arenas within the gc chunks.  An arena holds objects all of the same size, and all arenas are the same size.
 * We choose which chunk to allocate into according to a hashtable, and we choose which arena to allocate into based on a free list (it's LIFO or FIFO).

If the allocator tried to keep arenas as close to one another as possible (e.g. by always allocating into the free slot with the lowest address), then I think you could say that moving to chunk size of X saves X times the number of contiguous gaps in the heap of size X.  But since the allocator picks locations for arenas basically at random (*), we could easily end up with something like:

    Chunk can hold 8 arenas.  Allocate two: A---B---

If we switched to a chunk which could hold 4 arenas, we'd save one chunk's worth of space, even though there are no chunks with space for 4 consecutive arenas.

(*) Not that allocating arenas randomly within the chunk is a bad thing!  Since they're all the same size, it doesn't matter how you allocate them.

Justin Lebar (not reading bugmail)

Reporter

Comment 35

•

13 years ago

> 1. Nick says in a couple of places that compacting would really solve the 
> problem, and that generational would help further by not promoting the 
> objects to a mark-sweep heap in the first place. That suggests maybe (key 
> word is "maybe") we should just throw all our effort into getting that done.

I think there's some relatively low-hanging fruit here, as we're seeing with the experiments to change the chunk size.

> billm says [jemalloc's low heap-allocated to heap-commited ratio] might be 
> mostly from decommitting unused pages. That's another 
> question for modeling: on the data sets here, what would our usage ratio be if 
> we decommitted all the unused pages?

I don't think you can do this, at least on Windows.  My understanding from reading the jemalloc comments is that each VirtualFree call must correspond with exactly one VirtualAlloc call.  So you can't decommit part of a gc chunk which was allocated with a single VirtualAlloc.

IOW, "how much memory would we save by decommitting unused pages" is the same as "how much memory would we save by switching to 64kb gc chunks and decommitting empty chunks".  (Windows VirtualAlloc chunks must be aligned to 64kb -- you can allocate less, I think, but then you fragment your address space.)

> The big difference seems to be when the working set size drops, e.g., after 
> closing some tabs, or after a JS program reduces its WS for some reason.

Yes, I think this is right.  This is kind of the crux of much of the MemShrink effort -- we want to give memory back when we no longer need it.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 36

•

13 years ago

(In reply to comment #35)
> I don't think you can do this, at least on Windows.  My understanding from
> reading the jemalloc comments is that each VirtualFree call must correspond
> with exactly one VirtualAlloc call.  So you can't decommit part of a gc
> chunk which was allocated with a single VirtualAlloc.

I don't think this is true. I don't know too much about jemalloc, but I found this in the code:
  http://hg.mozilla.org/mozilla-central/file/504a1a927d39/memory/jemalloc/jemalloc.c#l1833

It looks like it calls pages_map, which is just a big VirtualAlloc on Windows. Then it goes ahead and, in pages_decommit, it decommits some of the memory it just allocated (the end of it, in fact).

I agree that the Windows manuals are worded to suggest that you shouldn't do this sort of thing, but it also seems like everyone does it anyway.

Justin Lebar (not reading bugmail)

Reporter

Comment 37

•

13 years ago

Looks like you're right.  Maybe the rule is that you can decommit in the middle of an allocation but map / unmaps need to line up?  That would be consistent with what I'd seen.

http://hg.mozilla.org/mozilla-central/file/504a1a927d39/memory/jemalloc/jemalloc.c#l2421

Nicholas Nethercote [inactive]

Assignee

Comment 38

•

13 years ago

(In reply to comment #30)
> 
> Can you model smaller chunks on this data? I.e., for various values of chunk
> size, see how many chunks you'd be able to free, and what the unused
> fraction would end up being?

You can get a very rough model if you look at the visualization -- you can see that there aren't that many contiguous runs of free arenas.  So even if you shrunk the chunk size (and kept everything else equal) you can see not many extra chunks would be freeable.  I don't have numbers on this, if you want them it would be easiest just to change GC_CHUNK_SIZE (or whatever it is) and do the measurements.
 
> 2. Just for clarification, what exactly is the significance of fragmentation
> here? It seems to me that in a situation where the browser is running JS
> programs that roughly continuously allocate memory, with a stable working
> set of size W, it doesn't matter much, because the allocations will fill in
> those gaps until we get back to W or so, at which time we GC. Is that right,
> or would memory usage be lower or observable smaller in some sense we care
> about if fragmentation were lower?

It's rare to see js-gc-heap-unused-fraction drop below 20%.  So that's a lot of memory that's allocated but unused.  Completely empty arenas probably don't matter that much (other than perception) because they're unlikely to contribute much to paging.  Partially empty arenas are bad, though, because they will contribute to paging.  In other words, even though js-gc-heap-chunk-unused is usually bigger than js-gc-heap-arena-unused, the latter probably affects real performance more.  Hence bug 673840 (which also suggests generational GC as the long-term fix for this...)

Justin Lebar (not reading bugmail)

Reporter

Comment 39

•

13 years ago

> You can get a very rough model if you look at the visualization -- you can see 
> that there aren't that many contiguous runs of free arenas.

AIUI, the arenas are (re)allocated basically at random throughout a chunk.  See comment 34.  (Bill might correct me, but I'd guess what we do is allocate contiguously when we first get handed a chunk -- this explains why you sometimes see nice contiguous runs of arenas.  Then when we GC, we free arenas in the chunk -- which arenas we free and the order we free them in is effectively random.  Freed arenas get added to a free-list, presumably in the order they're GC'ed in.  So now when we allocate an arena off the free list, its position in the chunk is effectively random.)

I don't think the fact that arenas are scattered throughout the chunk necessarily means that switching to smaller chunk sizes wouldn't save us anything.  The relevant question is: How much of the scattering is due to us filling up a chunk and then freeing arenas randomly inside, and how much is due to us reallocating arenas in a chunk at random locations?

You could get a handle on this by changing the allocator to try to pack chunks together as tightly as possible.  This might improve the usefulness of the visualization, but I don't think it would have practical positive effects -- since all the arenas are the same size, it doesn't matter how we allocate them within the chunk.

> I don't have numbers on this, if you want them it would be easiest just to change 
> GC_CHUNK_SIZE (or whatever it is) and do the measurements.

Bug 671702 changes the chunk size and makes a chunk contain arenas from just one compartment.  If we're really interested, we could ask Igor to measure these two changes independently.

> Completely empty arenas probably don't matter that much (other than perception) because 
> they're unlikely to contribute much to paging.

I don't think completely empty arenas matter quite as much as partially-empty arenas, but they do matter.  Suppose we have a bunch of empty arenas.  Then the user loads The Big Picture, which eats up a lot of RAM.  So the OS decides to swap out some of Firefox's pages.  There are two possibilities here: Either the OS swaps out the empty arenas, or it swaps out something else.

If it swaps out the empty arena, then when we later try to allocate into that arena, we have to swap it back in -- this is really silly, because the arena is empty!  We should be able to allocate into that arena basically for free, but instead we take a page fault.

If otoh the OS swaps out something else, then this is just as bad as any other kind of paging.

Randell Jesup [:jesup] (needinfo me)

Comment 40

•

13 years ago

(In reply to comment #39)
> You could get a handle on this by changing the allocator to try to pack
> chunks together as tightly as possible.  This might improve the usefulness
> of the visualization, but I don't think it would have practical positive
> effects -- since all the arenas are the same size, it doesn't matter how we
> allocate them within the chunk.

Agreed, at least to the first order approximation.  Each arena is typically an OS MMU page, and the issue is how many are in the working set, not which ones.  Perhaps there's a minor 2nd or 3rd-order advantage to having runs of dirty pages at the OS level, but not worth worrying about, even if it's true.  The only advantage would be if we decommitted and committed at the page level, and merged commit/decommit operations to reduce overhead (see below).

...
> > Completely empty arenas probably don't matter that much (other than perception) because 
> > they're unlikely to contribute much to paging.
> 
> I don't think completely empty arenas matter quite as much as
> partially-empty arenas, but they do matter.  Suppose we have a bunch of
> empty arenas.  Then the user loads The Big Picture, which eats up a lot of
> RAM.  So the OS decides to swap out some of Firefox's pages.  There are two
> possibilities here: Either the OS swaps out the empty arenas, or it swaps
> out something else.
> 
> If it swaps out the empty arena, then when we later try to allocate into
> that arena, we have to swap it back in -- this is really silly, because the
> arena is empty!  We should be able to allocate into that arena basically for
> free, but instead we take a page fault.

In Windows, there's the concept of MEM_RESET, which basically says "keep this committed, but I don't care about the contents anymore".  In linux I think you'd need to decommit.

jemalloc decommits and commits pages within a chunk as needed, but it also spends considerable effort to minimize system calls when doing so, etc.  Adding this to the JS code would be complex (I believe this is already a separate bug).

jemalloc tries to avoid thrashing freeing and allocating the mmap() allocations by keeping a 'spare' chunk per 'arena'.

And note: terminology is important.  jemalloc terminology unfortunately is almost the inverse of the JS GC terminology given in comment 6:

Arena: master object for locking, typically per-CPU (I think we use 1)
Chunk: an allocation from the OS, typically via mmap(), normally of 1MB (or more for huge objects)
Run: part of a chunk (often from a restricted set of allocation sizes) that's managed as a set of allocatable objects, often via a bitmap.  For allocations from 4K to 1MB, each allocation gets it's own run from the chunk.

The JS memory code appears to use 'arenas' for roughly jemalloc's 'runs', and 'FreeSpans' are roughly equivalent to 'runs'.

Justin Lebar (not reading bugmail)

Reporter

Comment 41

•

13 years ago

>> You could get a handle on this by changing the allocator to try to pack
>> chunks together as tightly as possible.  This might improve the usefulness
>> of the visualization, but I don't think it would have practical positive
>> effects

> Agreed, at least to the first order approximation.  Each arena is typically an 
> OS MMU page, and the issue is how many are in the working set, not which ones.  
> Perhaps there's a minor 2nd or 3rd-order advantage to having runs of dirty pages
> at the OS level, but not worth worrying about, even if it's true.

Actually, now that you mention it, the OS will fetch more than just one page when you have a page fault.  So in that case, it *would* be helpful for used arenas to be packed tightly.

> In Windows, there's the concept of MEM_RESET, which basically says "keep this 
> committed, but I don't care about the contents anymore".  In linux I think you'd 
> need to decommit.

On Unix, there's madvise MADV_DONTNEED, which allows the OS to lazily decommit a page.  pbiggar has been trying to use this for jemalloc on mac, but it looks like it doesn't work properly on 10.5.

MEM_RESET might be helpful, but all things being equal, I'd prefer to actually decommit, since that actually reduces our memory usage.

Randell Jesup [:jesup] (needinfo me)

Comment 42

•

13 years ago

decommit is possible, see jemalloc. You do need care about trying to coalesce decommits and avoid too much overhead from frothing memory allocations (page table changes, etc) - again see jemalloc, see also its red-black trees, etc.  Note that decommit reduces resident size (I assume) but not VSS - the memory is still allocated/mapped.

It's too bad that the JS GC isn't somehow layered on top of jemalloc instead of having to reinvent the wheel...   Hmmm.

Nicholas Nethercote [inactive]

Assignee

Comment 43

•

13 years ago

Attached patch patch, v1b (horrid code quality) (obsolete) — Details — Splinter Review

This patch just updates for some minor aboutMemory.js changes.

Attachment #545598 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Assignee

Comment 44

•

13 years ago

Attached patch patch, v1b (horrid code quality) (obsolete) — Details — Splinter Review

This patch just updates for some minor aboutMemory.js changes.

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Attachment #548703 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Assignee

Comment 45

•

13 years ago

Hmm, now I'm getting a div-by-zero crash with patch v1b, because somehow an ArenaHeader has a thingKind of FINALIZE_LIMIT, which is a bogus value, and so the GCThingSizeMap[thingKind] lookup in IterateChunkArenaCells() gets a (bogus) thingKind of zero.

Gregor Wagner [:gwagner]

Comment 46

•

13 years ago

(In reply to comment #45)
> Hmm, now I'm getting a div-by-zero crash with patch v1b, because somehow an
> ArenaHeader has a thingKind of FINALIZE_LIMIT, which is a bogus value, and
> so the GCThingSizeMap[thingKind] lookup in IterateChunkArenaCells() gets a
> (bogus) thingKind of zero.

Ah this changed today. Bug 673760.

Nicholas Nethercote [inactive]

Assignee

Comment 47

•

13 years ago

Attached patch patch, v1c (horrid code quality) — Details — Splinter Review

This fixes the crash caused when aheader->getThingKind() == FINALIZE_LIMIT.

Attachment #548702 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Whiteboard: [MemShrink:P1] → [MemShrink]

Nicholas Nethercote [inactive]

Assignee

Updated

•

13 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Nicholas Nethercote [inactive]

Assignee

Comment 48

•

12 years ago

I'm tempted to WONTFIX this.  I'm not planning on working on it any more, and I'm hoping that generational GC (bug 619558) will greatly reduce JS heap fragmentation and render visualization unnecessary.

Justin Lebar (not reading bugmail)

Reporter

Comment 49

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #48)
> I'm tempted to WONTFIX this.  I'm not planning on working on it any more,
> and I'm hoping that generational GC (bug 619558) will greatly reduce JS heap
> fragmentation and render visualization unnecessary.

Done.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

patch, v1 (horrid code quality) 13 years ago Nicholas Nethercote [inactive] 21.15 KB, patch		Details \| Diff \| Splinter Review
screenshot, v1 13 years ago Nicholas Nethercote [inactive] 10.91 KB, image/png		Details
screenshot, without system/user chunk split 13 years ago Nicholas Nethercote [inactive] 4.08 KB, image/png		Details
screenshot, with system/user chunk split 13 years ago Nicholas Nethercote [inactive] 3.48 KB, image/png		Details
screenshot, long browsing session 13 years ago Nicholas Nethercote [inactive] 29.38 KB, image/png		Details
screenshot, long browsing session after closing all tabs 13 years ago Nicholas Nethercote [inactive] 14.80 KB, image/png		Details
patch, v1b (horrid code quality) 13 years ago Nicholas Nethercote [inactive] 19.65 KB, patch		Details \| Diff \| Splinter Review
patch, v1b (horrid code quality) 13 years ago Nicholas Nethercote [inactive] 19.65 KB, patch		Details \| Diff \| Splinter Review
patch, v1c (horrid code quality) 13 years ago Nicholas Nethercote [inactive] 19.69 KB, patch		Details \| Diff \| Splinter Review