quick search:
 

Inspecting a ZODB to find the causes of bloat

Submitted by: ldr
Last Edited: 2005-06-17

Category: Python(Script)

Average rating is: 0.0 out of 5 (0 ratings)

Description:
My ZODB was growing hugely with every transaction, but it was proving tricky to find the cause. Inspecting the records in the ZODB found the cause of the problem.

Source (Text):
#
# Utility methods to help inspect a ZODB
#
# Laurence Rowe 21/04/2005
#
# See also $SOFTWARE_HOME/bin/analyze.py
# and http://mail.zope.org/pipermail/zodb-dev/2001-August/001309.html
#
# First open the storage (read-only!) and iterate to the transaction you are
# interested in. recs = list(txn). find the size of each rec by len(rec.data)
# target is rec.oid of the rec you are interested in.
#
# In a zope debug console you can get the object with app._p_jar[rec.oid]
# For some objects (like BTrees.IOBTree.IOBucket) this is pretty useless.
# They represent themselves as their C data structure. Better find their path.
#
# Build a refmap - graph of object references
# (not too slow if the data.fs fits in memory).
# use doSearch to get a reference path (beginnings of other paths are returned
# as additionals). With the list of oids you can reconstruct the path by using
# app._p_jar[oid]. When you reach a python object something useful is shown!
#


from ZODB.referencesf import referencesf
def buildRefmap(fs):
    '''build a refmap from a filestorage. look in every record of every
       transaction. build a dict of oid -> list(referenced oids)
    '''
    refmap = {}
    fsi = fs.iterator()

    for txn in fsi:
        for rec in txn:
            pickle, revid = fs.load(rec.oid, rec.version)
            refs = referencesf(pickle)
            refmap[rec.oid] = refs

    return refmap

def backrefs(target, refmap):
    '''Return a list of oids in the refmap who reference target
    '''
    oidlist = []
    for oid, refs in refmap.iteritems():
        if target in refs:
            oidlist.append(oid)
    return oidlist

def doSearch(target, refmap):
    '''for a target oid find the path of objects that refer to it.
       break if we reach no more references or find a cycle
    '''
    path = [target]
    additionals = []

    while True:
        target = path[-1:].pop()
        brefs = backrefs(target, refmap)
        if not brefs:
            break

        bref = brefs[0]
        if bref in path:
            print 'cyclic', bref
            break

        if len(brefs) == 1:
            path.append(bref)
            print bref
            continue

        additionals.append( (target, brefs[1:]) )
        print bref, brefs[1:]
        path.append(bref)

    return (path, additionals)

Explanation:
I found that a sungle BTrees.IOBTree.IOBucket was to blame- but what caused it?

These methods help you recreate the path to the suspect object - in my case through a ZCTextIndex.
Just the clue I needed to fix the problem (searchable=1 on Archetypes FileFields).

Laurence Rowe


Comments:

example usage by runyaga - 2005-06-16
please give example usage
 
by ldr - 2005-06-17


Example Usage by ldr - 2005-06-17
okay assume you have downloaded this module as inspectZodbUtils.py in your var directory.

$ ./bin/zopectl debug
from inspectZodbUtils import *
from ZODB.FileStorage import FileStorage
fs = FileStorage(path, read_only=1)
fsi = fs.iterator()

# assume you want to look at the last transaction
for txn in fsi:
    pass

recs = list(txn)
print max([(len(rec.data), rec.oid) for rec in recs])
#
# This will print the largest record oid
# (54341, '\x00\x00\x00\x00\x00\x07\x84\xe9')
#
# take a look at that object
oid = '\x00\x00\x00\x00\x00\x07\x84\xe9'

print app._p_jar[oid] # take a look at that object
# IOBucket([(-1920550923, ((), u.....
# ... screenfuls of text ...


# but where is that object?
refmap = buildRefmap(fs) # can take a while
path, additionals = doSearch(oid, refmap)

# take a look at that path (skipping the first oid as we get
# screenfuls of text in its repr
for o in path[1:]:
      print app._p_jar[o]

# <IOBTree object at 0xee6abf0>
# <Catalog instance at f0aad70>
# <CatalogTool instance at f0aa170>
# <TextIndex at Description>

# Now you'll notice that there's a CatalogTool in the path of back
# references, with luck that's enough of a clue as to where the
# problem . Inspect the objects and look at the reference paths
# starting at the alternatives if not