BinarySearchAutocomplete
implements the Autocompletor
interface, which
means it should, given a list of terms and weights for those terms, be able
to find the top match(es) for a prefix amongst those terms. To do this, you
will write binary search methods which can narrow the list of all terms
down to a list of terms matching the prefix. Once we have a list of
prefix-matching terms, finding the highest weight ones should be trivial.
Within this class, you should:
###The Constructor
The constructor for this class is already written for you. However, you should still quickly read through it to understand what it does. Notice that the list of terms is initialized for you in lexicographic order. This means you should not need to sort it yourself - doing so will slow down your methods unnecessarily. Furthermore, you should maintain the sorted order - if one of your methods changes the array of Terms, or if methods fail because the first method call did not preserve sorted order for the second, you might lose points.
###firstIndexOf and lastIndexOf You should be comfortable with the binary search algorithm already. We will be implementing two more specific versions of it in the methods firstIndexOf and lastIndexOf. firstIndexOf and lastIndexOf are different from typical binary search algorithms in two ways:
As a performance requirement, firstIndexOf and lastIndexOf should use at most 1 + ⌉log2N⌈ compare calls for an N-element array. The methods should return -1 if no matching element exists, including the case where the input array is empty.
###topMatch and topKMatches
Once firstIndexOf and lastIndexOf are written, most of the work for topMatch and topKMatches is done. By using firstIndexOf and lastIndexOf and Term.PrefixOrder, you can easily find the indices of all terms starting with the prefix. Once you have narrowed down the list of terms to m terms starting with the prefix, finding the highest weight term(s) and returning them should be trivial.
If there are m terms starting with the prefix, the best timing for finding a top match is O(log n + m), and the best timing for the top k matches is O(log n + m log m). You should be able to justify for yourself that your methods run in these times.
One may notice that topMatch is simply a special case of topKMatches where k = 1 and be tempted to have topMatch simply call
topKMatches(prefix,1)
Do not do this – the optimal timing for topKMatches is worse than topMatch’s optimal timing, meaning calling topKMatches with k = 1 will never be as fast as properly implementing topMatch if written correctly. If your topMatch method simply calls topKMatches with k = 1 and returns the output for either Autocompletor implementation, you will lose points.
###After BinarySearchAutocomplete Is Written
You can test BinarySearchAutocomplete similar to how you tested Term - set AUTOCOMPLETOR_CLASS_NAME
to BINARY_SEARCH_AUTOCOMPLETE
from AutocompleteMain, and run it using words-333333.txt as the source and “auto” as the query. The results
should be the same as the ones shown on the previous page. In general, BruteAutocomplete
and BinarySearchAutocomplete
(and TrieAutocomplete
) should always produce the same output for topMatch
and topKMatches
.
Keep in mind that if your BruteAutocomplete works but BinarySearchAutocomplete does not, it may be a problem with Term instead of with BinarySearchAutocomplete - BruteAutocomplete only uses Term.WeightOrder, whereas BinarySearchAutocomplete could potentially use all of Term’s Comparable/Comparators.