cltoolkit API documentation¶
For installation instructions and an overview see the README
Follow the links below for documentation of the cltoolkit Python API.
Data models¶
Basic models.
- class cltoolkit.models.CLCore(id, wordlist=None, data=None)[source]¶
Base class to represent data in a wordlist.
- class cltoolkit.models.WithForms(forms=None)[source]¶
Mixin to represent data in a wordlist that contains forms.
- class cltoolkit.models.WithDataset(obj=None, dataset=None)[source]¶
Mixin to represent data in a wordlist from a specific dataset.
- class cltoolkit.models.Language(id, wordlist=None, data=None, forms=None, obj=None, dataset=None, senses=None, concepts=None)[source]¶
Base class for handling languages.
- Variables
senses – DictTuple of senses, i.e. glosses for forms.
concepts – DictTuple of senses with explicit Concepticon mapping.
glottocode – str, Glottocode for the language.
Note
A language variety is defined for a specific dataset only.
- class cltoolkit.models.Sense(id, wordlist=None, data=None, forms=None, obj=None, dataset=None, language=None)[source]¶
A sense description (concept in source) which does not need to be linked to the Concepticon.
- Variables
language –
Language
instancename – str, the gloss
Note
Unlike senses in a wordlist, which are dataset-specific, concepts in a wordlist are defined for all datasets.
- class cltoolkit.models.Concept(id, wordlist=None, data=None, forms=None, language=None, senses=None, name=None, concepticon_id=None, concepticon_gloss=None)[source]¶
Base class for the concepts in a dataset.
- Variables
language –
Language
instancename – str, the gloss
senses – iterable of senses mapped to this concept
concepticon_id – str ID of the Concepticon concept set the concept is mapped to.
concepticon_gloss – str gloss of the Concepticon concept set the concept is mapped to.
Note
Unlike senses in a wordlist, which are dataset-specific, concepts in a wordlist are defined for all datasets. As a result, they lack a reference to the original dataset in which they occur, but they have an attribute senses which is a reference to the original senses as they occur in different datasets.
- class cltoolkit.models.Form(id, wordlist=None, data=None, obj=None, dataset=None, concept=None, language=None, sense=None, sounds=NOTHING, cognates=NOTHING)[source]¶
Base class for handling the form part of linguistic signs.
- Variables
concept – The concept (if any) expressed by the form.
language – The language in which the form occurs.
sense – The meaning expressed by the form.
sounds – The segmented strings defined by the B(road) IPA.
graphemes – The segmented graphemes (possibly not BIPA conform).
- sounds¶
Sounds (graphemes recognized in the specified transcription system) in the segmented form:
- graphemes¶
Graphemes in the segmented form:
- class cltoolkit.models.Cognate(id, wordlist=None, data=None, obj=None, dataset=None, form=None, contribution=None)[source]¶
Wordlists¶
- class cltoolkit.wordlist.Wordlist(datasets, ts=None, concept_id_factory=<function Wordlist.<lambda>>)[source]¶
A collection of one or more lexibank datasets, aligned by concept.
- Parameters
datasets (
typing.List
[pycldf.dataset.Dataset
]) – The datasets you want to load, provided as list of pycldf.Dataset.ts (
typing.Optional
[pyclts.transcriptionsystem.TranscriptionSystem
]) – A TranscriptionSystem (as provided by pyclts), if you want to work with phonological features from CLTS.concept_id_factory (
typing.Callable
[[dict
],str
]) –
- Variables
datasets –
languages –
DictTuple
senses –
DictTuple
concepts –
DictTuple
forms –
DictTuple
Wordlist.graphemes –
DictTuple
sounds –
DictTuple
- iter_forms_by_concepts(concepts=None, languages=None, aspect=None, filter_by=None, flat=False)[source]¶
Iterate over the concepts in the data and return forms for a given language.
- Parameters
concepts – List of concept identifiers, all concepts if not specified.
language – List of language identifiers, all languages if not specified.
aspect – Select attribute of the Form object instead of the Form object.
filter_by – Use a function to filter the data to be output.
flatten – Return a one-dimensional array of the data.
Note
The function returns for each concept (selected by ID) the form for each language, or the specific aspect (attribute) of the form, provided this exists.
Features¶
A feature is some aspect of language, e.g. the size of its phoneme inventory.
In cltoolkit, a Feature
is an object bundling some metadata with a python callable
accepting a cltoolkit.models.Language
instance as its sole argument, and returning the
value computed for this language.
A FeatureCollection
of predefined features is available in FEATURES
.
- class cltoolkit.features.collection.Feature(id, name, function, type=None, note=None, categories=None, requires=None)[source]¶
- Variables
id – str
name – str
function – callable
See also
- Parameters
function (
typing.Union
[str
,dict
,typing.Callable
]) –
- class cltoolkit.features.collection.FeatureCollection(items, **kw)[source]¶
A collection of Feature instances.
- cltoolkit.features.collection.get_callable(s)[source]¶
A “feature function” can be specified in 3 ways:
as Python callable object
as string of dot-separated names, where the part up to the last dot is taken as Python module spec, and the last name as symbol to be looked up in this module
as dict with keys class, args, kwargs, where class is interpreted as above, and args and kwargs are passed into the imported class to initialize an instance, the __call__ method of which will be used as “feature function”.
- Parameters
s (
typing.Union
[str
,dict
,typing.Callable
]) –- Return type
typing.Callable
Requirements¶
Features may have different requirements regarding the kind of data needed to perform the
computation. These requirements can be expressed (and enforced) by decorating the callable
(function or method) using the cltoolkit.features.requires()
decorator, parametrized with
the appropriate callables from the cltoolkit.features.reqs module - or any other callable
accepting a cltoolkit.models.Language
instance as argument, returning True if the
requirement is met.
- exception cltoolkit.features.reqs.MissingRequirement[source]¶
Exception raised by
requires()
(before calling the decorated function) when a requirement is not met
- cltoolkit.features.reqs.inventory(language)[source]¶
Make sure a language has a precomputed sound inventory.
- cltoolkit.features.reqs.graphemes(language)[source]¶
Make sure a language has segmented forms, i.e. lists of graphemes for each form.
- cltoolkit.features.reqs.concepts(language)[source]¶
Make sure a language has forms linked to concepts, i.e. senses with Concepticon mapping.
Phonological features¶
Miscellaneous phonological features found in typological databases.
- class cltoolkit.features.phonology.WithInventory(*args, **kw)[source]¶
Base class for feature callables requiring access to a phoneme inventory.
- class cltoolkit.features.phonology.InventoryQuery(attr)[source]¶
Compute the length/sizte of some attribute of a sound inventory.
number_of_consonants = InventoryQuery('consonants')
- class cltoolkit.features.phonology.YesNoQuery(attr)[source]¶
Compute whether an inventory has some property.
has_tones = YesNoQuery('tones')
- class cltoolkit.features.phonology.Ratio(attr1, attr2)[source]¶
Computes the ratio between sizes of two properties of an inventory.
- class cltoolkit.features.phonology.StartsWithSound(concepts, features, concept_label=None, sound_label=None)[source]¶
Check if a language has a form for {} starting with {}.
Note
Parametrized instances of this class can be used to check for certain cases of sound symbolism, or geographic / areal trends in languages to have word forms for certain concepts starting in certain words.
See also
mother_with_m = StartsWithSound(["MOTHER"], [["bilabial", "nasal"]], sound_label='[m]')
- Parameters
concepts (
typing.List
[str
]) –features (
typing.List
[typing.List
[str
]]) –concept_label (
typing.Optional
[str
]) –sound_label (
typing.Optional
[str
]) –
- cltoolkit.features.phonology.sound_match(sound, features)[source]¶
Match a sound by a subset of features.
Note
The major idea of this function is to allow for the convenient matching of some sounds by defining them in terms of a part of their features alone. E.g., [m] and its variants can be defined as [“bilabial”, “nasal”], since we do not care about the rest of the features.
- cltoolkit.features.phonology.is_implosive(sound)[source]¶
This groups stops and affricates into a group of sounds.
- cltoolkit.features.phonology.stop_like(sound)[source]¶
This groups stops and affricates into a group of sounds.
- class cltoolkit.features.phonology.HasSoundsWithFeature(attr, features)[source]¶
Does the inventory contain at least one {}.
prenasalized_consonants = phonology.HasSoundsWithFeature("consonants", [["pre-nasalized"]])
- cltoolkit.features.phonology.syllable_complexity(forms_with_sounds)[source]¶
Compute the major syllabic patterns for a language.
Note
The computation follows the automated syllabification process described in List (2014) based on sonority. Based on this syllabification, we calculate the number of consonants preceding the syllable nucleus and those following it. For a given syllable, we store the form, the consonantal sounds, and the index of the syllable in the word. These values are returned in the form of two dictionaries, in which the number of sounds is the key.
Lexical features¶
Miscellaneous features for lexical data.
- class cltoolkit.features.lexicon.ConceptComparison(alist, blist, ablist=None, alabel=None, blabel=None)[source]¶
Virtual base class for features comparing lexical data via concepts.
- Parameters
alist (
typing.List
[str
]) –blist (
typing.List
[str
]) –ablist (
typing.Optional
[typing.List
[str
]]) –alabel (
typing.Optional
[str
]) –blabel (
typing.Optional
[str
]) –
- class cltoolkit.features.lexicon.Colexification(*args, **kw)[source]¶
Computes if two concepts are expressed with the same form in a language (i.e. if they are colexified).
- class cltoolkit.features.lexicon.PartialColexification(*args, **kw)[source]¶
Computes if two concepts are partially colexified, i.e. if a form for the first concept is contained in a form for the second concept.
Computes if forms for the two concepts share a substring (of length >= 3).