Features¶
A feature is some aspect of language, e.g. the size of its phoneme inventory.
In cltoolkit, a Feature
is an object bundling some metadata with a python callable
accepting a cltoolkit.models.Language
instance as its sole argument, and returning the
value computed for this language.
A FeatureCollection
of predefined features is available in FEATURES
.
- class cltoolkit.features.collection.Feature(id, name, function, type=None, note=None, categories=None, requires=None)[source]¶
- Variables
id – str
name – str
function – callable
See also
- Parameters
function (
typing.Union
[str
,dict
,typing.Callable
]) –
- class cltoolkit.features.collection.FeatureCollection(items, **kw)[source]¶
A collection of Feature instances.
- cltoolkit.features.collection.get_callable(s)[source]¶
A “feature function” can be specified in 3 ways:
as Python callable object
as string of dot-separated names, where the part up to the last dot is taken as Python module spec, and the last name as symbol to be looked up in this module
as dict with keys class, args, kwargs, where class is interpreted as above, and args and kwargs are passed into the imported class to initialize an instance, the __call__ method of which will be used as “feature function”.
- Parameters
s (
typing.Union
[str
,dict
,typing.Callable
]) –- Return type
typing.Callable
Requirements¶
Features may have different requirements regarding the kind of data needed to perform the
computation. These requirements can be expressed (and enforced) by decorating the callable
(function or method) using the cltoolkit.features.requires()
decorator, parametrized with
the appropriate callables from the cltoolkit.features.reqs module - or any other callable
accepting a cltoolkit.models.Language
instance as argument, returning True if the
requirement is met.
- exception cltoolkit.features.reqs.MissingRequirement[source]¶
Exception raised by
requires()
(before calling the decorated function) when a requirement is not met
- cltoolkit.features.reqs.inventory(language)[source]¶
Make sure a language has a precomputed sound inventory.
- cltoolkit.features.reqs.graphemes(language)[source]¶
Make sure a language has segmented forms, i.e. lists of graphemes for each form.
- cltoolkit.features.reqs.concepts(language)[source]¶
Make sure a language has forms linked to concepts, i.e. senses with Concepticon mapping.
Phonological features¶
Miscellaneous phonological features found in typological databases.
- class cltoolkit.features.phonology.WithInventory(*args, **kw)[source]¶
Base class for feature callables requiring access to a phoneme inventory.
- class cltoolkit.features.phonology.InventoryQuery(attr)[source]¶
Compute the length/sizte of some attribute of a sound inventory.
number_of_consonants = InventoryQuery('consonants')
- class cltoolkit.features.phonology.YesNoQuery(attr)[source]¶
Compute whether an inventory has some property.
has_tones = YesNoQuery('tones')
- class cltoolkit.features.phonology.Ratio(attr1, attr2)[source]¶
Computes the ratio between sizes of two properties of an inventory.
- class cltoolkit.features.phonology.StartsWithSound(concepts, features, concept_label=None, sound_label=None)[source]¶
Check if a language has a form for {} starting with {}.
Note
Parametrized instances of this class can be used to check for certain cases of sound symbolism, or geographic / areal trends in languages to have word forms for certain concepts starting in certain words.
See also
mother_with_m = StartsWithSound(["MOTHER"], [["bilabial", "nasal"]], sound_label='[m]')
- Parameters
concepts (
typing.List
[str
]) –features (
typing.List
[typing.List
[str
]]) –concept_label (
typing.Optional
[str
]) –sound_label (
typing.Optional
[str
]) –
- cltoolkit.features.phonology.sound_match(sound, features)[source]¶
Match a sound by a subset of features.
Note
The major idea of this function is to allow for the convenient matching of some sounds by defining them in terms of a part of their features alone. E.g., [m] and its variants can be defined as [“bilabial”, “nasal”], since we do not care about the rest of the features.
- cltoolkit.features.phonology.is_implosive(sound)[source]¶
This groups stops and affricates into a group of sounds.
- cltoolkit.features.phonology.stop_like(sound)[source]¶
This groups stops and affricates into a group of sounds.
- class cltoolkit.features.phonology.HasSoundsWithFeature(attr, features)[source]¶
Does the inventory contain at least one {}.
prenasalized_consonants = phonology.HasSoundsWithFeature("consonants", [["pre-nasalized"]])
- cltoolkit.features.phonology.syllable_complexity(forms_with_sounds)[source]¶
Compute the major syllabic patterns for a language.
Note
The computation follows the automated syllabification process described in List (2014) based on sonority. Based on this syllabification, we calculate the number of consonants preceding the syllable nucleus and those following it. For a given syllable, we store the form, the consonantal sounds, and the index of the syllable in the word. These values are returned in the form of two dictionaries, in which the number of sounds is the key.
Lexical features¶
Miscellaneous features for lexical data.
- class cltoolkit.features.lexicon.ConceptComparison(alist, blist, ablist=None, alabel=None, blabel=None)[source]¶
Virtual base class for features comparing lexical data via concepts.
- Parameters
alist (
typing.List
[str
]) –blist (
typing.List
[str
]) –ablist (
typing.Optional
[typing.List
[str
]]) –alabel (
typing.Optional
[str
]) –blabel (
typing.Optional
[str
]) –
- class cltoolkit.features.lexicon.Colexification(*args, **kw)[source]¶
Computes if two concepts are expressed with the same form in a language (i.e. if they are colexified).
- class cltoolkit.features.lexicon.PartialColexification(*args, **kw)[source]¶
Computes if two concepts are partially colexified, i.e. if a form for the first concept is contained in a form for the second concept.
Computes if forms for the two concepts share a substring (of length >= 3).