from fuzzywuzzy import fuzz from fuzzywuzzy import process
I'm interested in using it for matching latin accents.
fuzz.ratio('caffè espresso', 'caffe espresso')
fuzz.partial_ratio('caffè espresso', 'caffe espresso')
Also strange companies names during a duplication check.
fuzz.token_sort_ratio('01234567 Ontario Inc - Company Name', 'Company Name (01234567 Ontario Inc)')
fuzz.token_sort_ratio('01234567 Ontario Inc - Company Name', 'Company Name (01234567 Ont Inc)')
fuzz.partial_ratio('ABC Corp.', 'ABC Corporation')
fuzz.partial_ratio('ABC Corp.', 'ABC Inc.')
Generally the above wouldn't matter as you would remove the business entity during the duplicate check. It would be close to below.
fuzz.partial_ratio('Dell Canada', 'Dell')
It would also be interesting in matching Provinces and Territories of Canada.
provinces = ["Ontario", "Quebec", "Nova Scotia", "New Brunswick", "Manitoba", "British Columbia", "Prince Edward Island", "Saskatchewan", "Alberta", "Newfoundland and Labrador", "Northwest Territories", "Yukon", "Nunavut"]
('Newfoundland and Labrador', 90)
process.extractOne('Québec', provinces) # I'm sorry Francophones but it seems `process` doesn't work with accents.
('Prince Edward Island', 60)
process.extract('BC', provinces, limit=5)
[('Quebec', 50), ('Nova Scotia', 45), ('New Brunswick', 45), ('Manitoba', 45), ('British Columbia', 45)]
It's pretty bad for doing postal abbreviations but that would probably be better as a sperate function.