Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

How to have a set of sets in Python? [duplicate]

Writer Andrew Henderson

In Python there's no full support for heterogeneous data structures. For example this fails:

set(set(1,2),set(2,3))

What's the best way to treat sets of sets?

3

3 Answers

Use frozenset,

>>> set([frozenset([1,2]), frozenset([2,3])])
set([frozenset([1, 2]), frozenset([2, 3])])

To represent a set of sets, the inner sets must be frozenset objects for the reason that the elements of a set must be hashable (all of Python’s immutable built-in objects are hashable). frozenset is immutable and set is mutable.

You can't have a set of sets in the usual sense but if you can work with frozenset objects then it can work:

>>> set([set([1,2]), set([3,4])])
Traceback (most recent call last): File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

This does work:

>>> set([frozenset([1,2]), frozenset([3,4])])
set([frozenset([1, 2]), frozenset([3, 4])])

There is good reason why a set of sets does not work. Imagine the following (in pseudocode):

a = set(range(5)) # {0, 1, 2, 3, 4}
b = set(range(5, 10)) # {5, 6, 7, 8, 9}
c = set(range(5)) # {0, 1, 2, 3, 4}
# we've now created a set of sets, and then will drop `c` because it's redundant
d = set([a, b, c]) # {{0, 1, 2... }, {5, 6, 7...}}
# now we've changed a value inside the set, suddenly everything changes
a.add(6) # {{0, 1, 2..., 6}, {5, 6, 7...}}
# now we can re-add `c`
d.add(c) # {{0, 1, 2..., 6}, {5, 6, 7...}, {0, 1, 2...}}

Besides the weird behavior of how elements would suddenly disappear, or act differently if they were mutable, there is also the lack of hash-based lookups.

The set implementation is very similar to the dict implementation, and can be found here. This is the implementation if a set contains a given key. Notice how it calculates the hash of the object, finds the first occurrence, and then does a lookup from the hash?

static int
set_contains_key(PySetObject *so, PyObject *key)
{ long hash; setentry *entry; if (!PyString_CheckExact(key) || (hash = ((PyStringObject *) key)->ob_shash) == -1) { hash = PyObject_Hash(key); if (hash == -1) return -1; } entry = (so->lookup)(so, key, hash); if (entry == NULL) return -1; key = entry->key; return key != NULL && key != dummy;
}

Now, if we modify a in the example above, how do we perform a lookup? The only solution would be item-by-item lookups, which would have O(n) time complexity.

Fortunately, there is an easy solution shown above: an immutable set. Fortunately, Python even has this builtin, the frozenset.

With a frozenset, since it is immutable, a hash can be calculated, both preventing unexpected behavior, but also restoring our O(1) lookups.

In this case, we can do the following:

a = frozenset(range(5))
b = frozenset(range(5, 10))
c = frozenset(range(5))
d = set([a, b, c])

And now, d will allow membership lookup on individual items, with solely discrete containers, since the frozenset members are immutable.

1