If you do end up building something internally, we are happy with similarity-api.com - we do periodic whole salesforce base deduplications, as well as use the same API for ad-hoc fuzzy matching tasks (like match a list of company names to a list of salesforce data when the first list messy).