This repository contains two .txt files each containing, respectively, a list of over 7000 surnames and names of portuguese origin.
NOTE: The search and collection of Portuguese first and last names was intended to feed a knowledge base to serve a specialized spell checker for first and last names, basically playing the role of an OCR post processing line. Along with the two raw files feeding the KDB follows source code for running and testing the created spell checker.
- In each text file the names are repeated n times, where n is the frequency of occurrence/existence of a name in Portugal.
- Spell checker based on Norvig approach: https://norvig.com/spell-correct.html
https://pt.wikipedia.org/wiki/Lista_de_apelidos_de_fam%C3%ADlia_da_l%C3%ADngua_portuguesa https://pt.wikipedia.org/wiki/Lista_dos_cem_apelidos_mais_frequentes_em_Portugal https://pt.wikipedia.org/wiki/Antropon%C3%ADmia_da_l%C3%ADngua_portuguesa#Frequ%C3%AAncia https://github.com/centraldedados https://github.com/centraldedados/nomes_proprios