18.2. Function MAKE-HASH-TABLE

18.2.1. Interaction between HASH-TABLEs and garbage-collection

MAKE-HASH-TABLE accepts two additional keyword arguments :INITIAL-CONTENTS and :WEAK:

(MAKE-HASH-TABLE &KEY :TEST :INITIAL-CONTENTS :SIZE
                 :REHASH-SIZE :REHASH-THRESHOLD
                 :WARN-IF-NEEDS-REHASH-AFTER-GC :WEAK)

The :TEST argument can be, other than one of the symbols EQ, EQL, EQUAL, EQUALP, one of the symbols EXT:FASTHASH-EQ and EXT:STABLEHASH-EQ. Both of these tests use EQ as the comparison function; they differ in their performance characteristics.

EXT:FASTHASH-EQ
This uses the fastest possible hash function. Its drawback is that its hash codes become invalid at every garbage-collection (except if all keys are immediate objects), thus requiring a reorganization of the hash table at the first access after each garbage-collection. Especially when generational garbage-collection is used, which leads to frequent small garbage-collections, large hash table with this test can lead to scalability problems.
EXT:STABLEHASH-EQ
This uses a slower hash function that has the property that its hash codes for instances of the classes SYMBOL, EXT:STANDARD-STABLEHASH (subclass of STANDARD-OBJECT) and EXT:STRUCTURE-STABLEHASH (subclass of STRUCTURE-OBJECT) are stable across GCs. This test can thus avoid the scalability problems if all keys, other than immediate objects, are SYMBOL, EXT:STANDARD-STABLEHASH or EXT:STRUCTURE-STABLEHASH instances.

One can recommend to use EXT:FASTHASH-EQ for short-lived hash tables. For tables with a longer lifespan which can be big or accessed frequently, it is recommended to use EXT:STABLEHASH-EQ, and to modify the objects that are used as its keys to become instances of EXT:STANDARD-STABLEHASH or EXT:STRUCTURE-STABLEHASH.

When the symbol EQ or the function #'eq is used as a :TEST argument, the value of the variable CUSTOM:*EQ-HASHFUNCTION* is used instead. This value must be one of EXT:FASTHASH-EQ, EXT:STABLEHASH-EQ.

Similarly, the :TEST argument can also be one of the symbols EXT:FASTHASH-EQL, EXT:STABLEHASH-EQL, EXT:FASTHASH-EQUAL, EXT:STABLEHASH-EQUAL. The same remarks apply as for EXT:FASTHASH-EQ and EXT:STABLEHASH-EQ. When the symbol EQL or the function #'eql is used as a :TEST argument, the value of the variable CUSTOM:*EQL-HASHFUNCTION* is used instead; this value must be one of EXT:FASTHASH-EQL, EXT:STABLEHASH-EQL. Similarly, when the symbol EQUAL or the function #'equal is used as a :TEST argument, the value of the variable CUSTOM:*EQUAL-HASHFUNCTION* is used instead; this value must be one of EXT:FASTHASH-EQUAL, EXT:STABLEHASH-EQUAL.

The :WARN-IF-NEEDS-REHASH-AFTER-GC argument, if true, causes a WARNING to be SIGNALed when an object is stored into the table which will force table reorganizations at the first access of the table after each garbage-collection. This keyword argument can be used to check whether EXT:STABLEHASH-EQ should be preferred over EXT:FASTHASH-EQ for a particular table. Use HASH-TABLE-WARN-IF-NEEDS-REHASH-AFTER-GC to check and SETF this parameter after the table has been created.

The :INITIAL-CONTENTS argument is an association list that is used to initialize the new hash table.

The :REHASH-THRESHOLD argument is ignored.

The :WEAK argument can take the following values:

NIL (default)
:KEY
:VALUE
:KEY-AND-VALUE
:KEY-OR-VALUE

and specifies whether the HASH-TABLE is weak: if the key, value, either or both are not accessible for the garbage-collection purposes, i.e., if they are only accessible via weak HASH-TABLEs and EXT:WEAK-POINTERs, it is garbage-collected and removed from the weak HASH-TABLE.

The SETFable predicate EXT:HASH-TABLE-WEAK-P checks whether the HASH-TABLE is weak.

Note that the only test that makes sense for weak hash tables are EQ and its variants EXT:FASTHASH-EQ and EXT:STABLEHASH-EQ.

Just like all other weak objects, weak HASH-TABLEs cannot be printed readably.

See also Section 31.7.9, “Weak Hash Tables”.

18.2.1. Interaction between HASH-TABLEs and garbage-collection

When a hash table contains keys to be compared by identity - such as NUMBERs in HASH-TABLEs with the HASH-TABLE-TEST EQ; or CONSes in tables which test with EQ or EQL; or VECTORs in tables which test with EQ, EQL or EQUAL; or STANDARD-OBJECT or STRUCTURE-OBJECT instances in tables which test with EQ, EQL, EQUAL or EQUALP; - the hash code will in general depend on the object's address in memory. Therefore it will in general be invalidated after a garbage-collection, and the hash table's internal structure must be recomputed at the next table access.

While :WARN-IF-NEEDS-REHASH-AFTER-GC can help checking the efficiency of a particular HASH-TABLE, the variable CUSTOM:*WARN-ON-HASHTABLE-NEEDING-REHASH-AFTER-GC* achieves the same effect for all HASH-TABLEs in the system at once: when CUSTOM:*WARN-ON-HASHTABLE-NEEDING-REHASH-AFTER-GC* is true and a HASH-TABLE needs to be rehashed after a garbage-collection, a warning is issued that shows the inefficient HASH-TABLE.

What can be done to avoid the inefficiencies detected by these warnings?

  1. In many cases you can solve the problem by using the STABLEHASH variant of the hash test.
  2. In other cases, namely STANDARD-OBJECT or STRUCTURE-OBJECT instances, you can solve the problem by making the key object classes inherit from EXT:STANDARD-STABLEHASH or EXT:STRUCTURE-STABLEHASH, respectively.
  3. In the remaining cases, you should store a hash key inside the object, of which you can guarantee uniqueness through your application (for example the ID of an object in a database, or the serial number of an object), and use this key as hash key instead of the original object.

These notes document CLISP version 2.49.93+Last modified: 2018-02-19