Title: | Custom 'MetaphoneBR' Phonetic Encoding for Brazilian Names |
Version: | 0.0.4 |
Description: | Simplifies Brazilian names phonetically using a custom 'metaphoneBR' algorithm that preserves ending vowels. Useful for name matching processing preserving gender information carried generally by ending vowels in Portuguese. Mation (2025) <doi:10.6082/uchicago.15104>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | lifecycle, stringi |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
URL: | https://github.com/ipeadata-lab/metaphonebr, https://ipeadata-lab.github.io/metaphonebr/ |
BugReports: | https://github.com/ipeadata-lab/metaphonebr/issues |
NeedsCompilation: | no |
Packaged: | 2025-07-14 15:43:16 UTC; B05497153712 |
Author: | Rodrigo Borges |
Maintainer: | Rodrigo Borges <rodrigoesborges@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-17 20:30:07 UTC |
metaphonebr: Custom 'MetaphoneBR' Phonetic Encoding for Brazilian Names
Description
Simplifies Brazilian names phonetically using a custom 'metaphoneBR' algorithm that preserves ending vowels. Useful for name matching processing preserving gender information carried generally by ending vowels in Portuguese. Mation (2025) doi:10.6082/uchicago.15104.
Author(s)
Maintainer: Rodrigo Borges rodrigoesborges@gmail.com (ORCID)
Authors:
Other contributors:
Ipea - Institue for Applied Economic Research [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/ipeadata-lab/metaphonebr/issues
Phonetic preprocessing: removes accents, numbers and capitalizes
Description
Remove diacritics, capitalizes and remove characters that are not letters or spaecs.
Usage
capitalize_remove_accents(fullname)
Arguments
fullname |
a character vector. |
Value
a preprocessed character vector.
Generates Phonetic Code (adapted Metaphone-BR) for Names in Portuguese
Description
Applies a series of phonetic transformations to a person names vector to generate code that represents its approximate pronunciation in Brazilian Portuguese. The objective is to group similar sounding names, even though written in different forms.
Usage
metaphonebr(fullnames, verbose = FALSE)
Arguments
fullnames |
A character vector for names to be processed. |
verbose |
Logical, if |
Details
The treatment process involves:
Preprocessing: Removal of accents, numbers and capitalize.
Removal of silent letters (initial H).
Simplification of common digraphs (LH, NH, CH, SC, QU, etc.).
Simplification of similar sounding consonants (C/K/S, G/J, Z/S, etc.).
Simplification of ending nasal sounds.
Removal of duplicated vowels.
Removal/trim of spaces and duplicated letters.
This is an adpation that does not follow strictly any published Metaphone algorithm, but was inspired by them considering brazilian portuguese context.
Value
A character vector with corresponding phonetic representation for each entry.
Examples
example_names <- c("Jo\u00e3o Silva", "Joao da Silva", "Maria", "Marya",
"Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr(example_names)
print(data.frame(Original = example_names, metaphonebr = phonetic_codes))
# With progress messages
phonetic_codes_verbose <- metaphonebr("Exemplo Ășnico", verbose = TRUE)
Phonetic Removal: duplicated letters and spaces
Description
Remove duplicated letters and spaces.
Usage
remove_dup_letters_spaces(fullname)
Arguments
fullname |
A character vector. |
Value
A character vector with no repeated letters nor spaces.
Phonetic Removal: Repeated Vowels
Description
Compress adjacent identical vowel sequences.
Usage
remove_duplicated_vowels(fullname)
Arguments
fullname |
A character vector. |
Value
A character vector with duplicated vowels removed.
Phonetic Simplification: removal of silent letters
Description
Removes silent 'H' at the beggining of each word.
Usage
remove_silent_letters(fullname)
Arguments
fullname |
a character vector. |
Value
a character vector with silent initial 'H's removed.
Phonetic Simplification: Similar Consonants
Description
Represent similar consonants with single representation.
Usage
simplify_consonants(fullname)
Arguments
fullname |
A character vector. |
Value
A character vector with simplified consonants.
Phonetic Simplification: similar digraphs
Description
Transforms common sounding digraphs to simplify their phonetic representation.
Usage
simplify_digraphs(fullname)
Arguments
fullname |
a character vector. |
Value
a character vector with simplified representation of digraphs.
Phonetic Simplification: Ending Nasal Sounds
Description
Unifies Ending Nasal Sounds.
Usage
simplify_ending_nasals(fullname)
Arguments
fullname |
A character vector. |
Value
A character vector with simplified nasal sounds.