The default size of bloom filters that are used for fuzzy symbol search.
The default size of bloom filters that are used for fuzzy symbol search.
The default value was chosen based on two criteria: cpu usage and memory usage. Here are the performance results of a searching for the query "File" with different bucket sizes.
Benchmark (bucketSize) (query) Mode Cnt Score Error Units ClasspathFuzzBench.run 128 File ss 50 77.908 ± 6.572 ms/op ClasspathFuzzBench.run 256 File ss 50 75.587 ± 3.173 ms/op ClasspathFuzzBench.run 512 File ss 50 74.154 ± 3.751 ms/op ClasspathFuzzBench.run 1024 File ss 50 81.825 ± 4.910 ms/op ClasspathFuzzBench.run 2048 File ss 50 92.891 ± 5.241 ms/op ClasspathFuzzBench.run 4096 File ss 50 101.436 ± 3.746 ms/op ClasspathFuzzBench.run 8192 File ss 50 106.969 ± 3.971 ms/op
There query performance goes down with larger bucket sizes presumably because for every hit we need to walk a lot of redudant classfile names.
Here is the memory usage of differently sized bloom filters.
| Size | Total memory | Ratio | | --- | -------------- | ----------------- | | 512 | 776 bytes | 1.52 byte/element | | 1024 | 1.21 kilobytes | 1.18 byte/element | | 2048 | 2.13 kilobytes | 1.04 byte/element |
For a classpath of 235Mb, the size of the index is 3.4Mb with bucketSize=512 and 1.8Mb with bucketSize=1024.
Given these observations, both 512 and 1024 seem like reasonable defaults so we'll go with 512 since it seems to squeeze out a tiny bit more performance at the cost of a small additional memory usage.
Return an index the classpath elements for fast fuzzy symbol search.
Return an index the classpath elements for fast fuzzy symbol search.
the map from all packages to their member classfile paths.
the maximum number of classpath elements in each returned compressed package index.