|
|
|
In general, the character ID (speaker ID) is the name of the character as in the *dramatis personae*, after applying the following modifications:
|
|
|
|
- Lowercased and without punctuation or special characters
|
|
|
|
- Spaces are replaced by underscore
|
|
|
|
- Diacritics are removed; when in doubt, do as the [Unidecode](https://pypi.org/project/Unidecode/) library would do it. **Note** that this library does not replace German umlauts with the relevant vowel + _e_, i.e. _ä_ goes to _a_, not to _ae_.
|
|
|
|
- Diacritics are removed; when in doubt, do as the [Unidecode](https://pypi.org/project/Unidecode/) library would do it. **Note** that this library does not replace German umlauts with the relevant vowel + _e_. In other words, _ä_ goes to _a_, not to _ae_.
|
|
|
|
|
|
|
|
We do not need to be very strict with this general rule, however.
|
|
|
|
For instance, if we drop a determiner it makes no difference.
|
| ... | ... | |