A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is O ( | S i | + | S j | ) = O ( N ) {\displaystyle O(|S_{i}|+|S_{j}|)=O(N)} .
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is O ( 1 ) {\displaystyle O(1)} , but the memory required is O ( n 2 ) {\displaystyle O(n^{2})} .
Define a "large set" as a set with at least N {\displaystyle {\sqrt {N}}} elements. Obviously there are at most N {\displaystyle {\sqrt {N}}} such sets. Create a table of intersection data between every large set to every other large set. This requires O ( N ) {\displaystyle O(N)} memory. Additionally, for each large set, keep a hash table of all its elements. This requires additional O ( N 3 / 2 ) {\displaystyle O(N^{3/2})} memory.
Given two sets, there are three possible cases:
In general, if we define a "large set" as a set with at least N c {\displaystyle N^{c}} elements, then the number of large set is at most N 1 − c {\displaystyle N^{1-c}} so the memory required is O ( N 2 − c ) {\displaystyle O(N^{2-c})} , and the query time is O ( N c ) {\displaystyle O(N^{c})} .
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way.[1]
This graph has the following properties:
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires Ω ( n 2 ) {\displaystyle \Omega (n^{2})} space to answer queries in time O ( 1 ) {\displaystyle O(1)} . If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.[1]