Problem Description
This problem comes from an interview given to a student of mine (who’s name I never can seem to remember), and seemed particularly interesting as it, at first thought, seems to be a good application of sets.
Minimum Distinct Items Problem: Given a list of
integers1 arr
and an integer, provide the minimum possible amount of unique items after removing at most items from the list.
Let’s run through some examples
Example:
arr=[1,1,4,6,7]
,
Answer
In this case we could remove any two ofNow before we get into the solutions, there is an important thing to notice that will help with our algorithm. Hopefully this is obvious to notice, but the optimal solution is going to use the maximum amount of removals possible. We’re going to prove this, but that should make sense because if we don’t use every removal, we could either do better or the same by using those extra ones.
Lemma: If
the optimal solution is to remove every item. Otherwise there is an optimal solution that will use all removals.
Proof
For the first part, ifIn the event
A Brute Force Solution
As with all coding interview problems, its always a good idea to go through the brute force solution at least to talk through. In this case, the brute force solution is quite bad, so I won’t code it since its not worth it.
In order to brute force, we can iterate through all possible selections of
We wouldn’t need to prove this solution though, as anything returned is guaranteed to be optimal by definition.
Writing this code out anyway, we will use the itertools in Python to provide all combinations of the items, and then will check how many unique items exist in that new list. Note that
from itertools import combinations
def unique_items_brute_force(arr, M):
n = len(arr)
if n <= M:
return 0
elif M == 0:
return len(set(arr))
min_unique = len(set(arr))
for sub_list in combinations(arr, n-M):
unique_items = len(set(sub_list)) # Theta(n) work
if unqiue_items < min_unique:
min_unique = unique_items
return unique_items
The amount of work per loop iteration is
Getting to the Solution
The solution is pretty straightforward to get to in this problem. The thing about removals is that for each removal, we want to get the most “bang for our buck”, which we can achieve by at all points by removing the least common item. Lets consider the example we had before of arr=[1,1,4,6,7]
and
- Count how many each item in the array appears
- Continually remove the lowest appearing item until you cannot remove any more items
- Return the remaining amount of items.
Coding this up we can do it by counting each item occurance, sorting based on the count, and then removing the least common item each iteration will