cltoolkit API documentation

For installation instructions and an overview see the README

Follow the links below for documentation of the cltoolkit Python API.

Data models

Basic models.

class cltoolkit.models.CLCore(id, wordlist=None, data=None)[source]

Base class to represent data in a wordlist.

class cltoolkit.models.WithForms(forms=None)[source]

Mixin to represent data in a wordlist that contains forms.

class cltoolkit.models.WithDataset(obj=None, dataset=None)[source]

Mixin to represent data in a wordlist from a specific dataset.

class cltoolkit.models.Language(id, wordlist=None, data=None, forms=None, obj=None, dataset=None, senses=None, concepts=None)[source]

Base class for handling languages.

Variables
  • sensesDictTuple of senses, i.e. glosses for forms.

  • conceptsDictTuple of senses with explicit Concepticon mapping.

  • glottocodestr, Glottocode for the language.

Note

A language variety is defined for a specific dataset only.

class cltoolkit.models.Sense(id, wordlist=None, data=None, forms=None, obj=None, dataset=None, language=None)[source]

A sense description (concept in source) which does not need to be linked to the Concepticon.

Variables
  • languageLanguage instance

  • namestr, the gloss

Note

Unlike senses in a wordlist, which are dataset-specific, concepts in a wordlist are defined for all datasets.

class cltoolkit.models.Concept(id, wordlist=None, data=None, forms=None, language=None, senses=None, name=None, concepticon_id=None, concepticon_gloss=None)[source]

Base class for the concepts in a dataset.

Variables
  • languageLanguage instance

  • namestr, the gloss

  • sensesiterable of senses mapped to this concept

  • concepticon_idstr ID of the Concepticon concept set the concept is mapped to.

  • concepticon_glossstr gloss of the Concepticon concept set the concept is mapped to.

Note

Unlike senses in a wordlist, which are dataset-specific, concepts in a wordlist are defined for all datasets. As a result, they lack a reference to the original dataset in which they occur, but they have an attribute senses which is a reference to the original senses as they occur in different datasets.

class cltoolkit.models.Form(id, wordlist=None, data=None, obj=None, dataset=None, concept=None, language=None, sense=None, sounds=NOTHING, cognates=NOTHING)[source]

Base class for handling the form part of linguistic signs.

Variables
  • concept – The concept (if any) expressed by the form.

  • language – The language in which the form occurs.

  • sense – The meaning expressed by the form.

  • sounds – The segmented strings defined by the B(road) IPA.

  • graphemes – The segmented graphemes (possibly not BIPA conform).

sounds

Sounds (graphemes recognized in the specified transcription system) in the segmented form:

graphemes

Graphemes in the segmented form:

class cltoolkit.models.Cognate(id, wordlist=None, data=None, obj=None, dataset=None, form=None, contribution=None)[source]
class cltoolkit.models.Grapheme(id, wordlist=None, data=None, obj=None, dataset=None, forms=None, grapheme=None, occurrences=None, language=None)[source]
class cltoolkit.models.Sound(id, wordlist=None, data=None, forms=None, grapheme=None, occurrences=None, graphemes_in_source=None, language=None, obj=None)[source]

All sounds in a dataset.

Wordlists

class cltoolkit.wordlist.Wordlist(datasets, ts=None, concept_id_factory=<function Wordlist.<lambda>>)[source]

A collection of one or more lexibank datasets, aligned by concept.

Parameters
  • datasets (typing.List[pycldf.dataset.Dataset]) – The datasets you want to load, provided as list of pycldf.Dataset.

  • ts (typing.Optional[pyclts.transcriptionsystem.TranscriptionSystem]) – A TranscriptionSystem (as provided by pyclts), if you want to work with phonological features from CLTS.

  • concept_id_factory (typing.Callable[[dict], str]) –

Variables
  • datasets

  • languagesDictTuple

  • sensesDictTuple

  • conceptsDictTuple

  • formsDictTuple

  • Wordlist.graphemesDictTuple

  • soundsDictTuple

iter_forms_by_concepts(concepts=None, languages=None, aspect=None, filter_by=None, flat=False)[source]

Iterate over the concepts in the data and return forms for a given language.

Parameters
  • concepts – List of concept identifiers, all concepts if not specified.

  • language – List of language identifiers, all languages if not specified.

  • aspect – Select attribute of the Form object instead of the Form object.

  • filter_by – Use a function to filter the data to be output.

  • flatten – Return a one-dimensional array of the data.

Note

The function returns for each concept (selected by ID) the form for each language, or the specific aspect (attribute) of the form, provided this exists.

class cltoolkit.util.DictTuple(items, **kw)[source]

An object allowing access to items of a tuple as if it were a dict keyed with the id attribute of the contained objects.

Features

A feature is some aspect of language, e.g. the size of its phoneme inventory.

In cltoolkit, a Feature is an object bundling some metadata with a python callable accepting a cltoolkit.models.Language instance as its sole argument, and returning the value computed for this language.

A FeatureCollection of predefined features is available in FEATURES.

class cltoolkit.features.collection.Feature(id, name, function, type=None, note=None, categories=None, requires=None)[source]
Variables
  • idstr

  • namestr

  • functioncallable

See also

get_callable()

Parameters

function (typing.Union[str, dict, typing.Callable]) –

class cltoolkit.features.collection.FeatureCollection(items, **kw)[source]

A collection of Feature instances.

dump(path)[source]

Dump feature specifications as JSON file.

classmethod load(path)[source]

Load feature specifications from a JSON file (e.g. as created with FeatureCollection.dump)

cltoolkit.features.collection.get_callable(s)[source]

A “feature function” can be specified in 3 ways:

  • as Python callable object

  • as string of dot-separated names, where the part up to the last dot is taken as Python module spec, and the last name as symbol to be looked up in this module

  • as dict with keys class, args, kwargs, where class is interpreted as above, and args and kwargs are passed into the imported class to initialize an instance, the __call__ method of which will be used as “feature function”.

Parameters

s (typing.Union[str, dict, typing.Callable]) –

Return type

typing.Callable

Requirements

Features may have different requirements regarding the kind of data needed to perform the computation. These requirements can be expressed (and enforced) by decorating the callable (function or method) using the cltoolkit.features.requires() decorator, parametrized with the appropriate callables from the cltoolkit.features.reqs module - or any other callable accepting a cltoolkit.models.Language instance as argument, returning True if the requirement is met.

exception cltoolkit.features.reqs.MissingRequirement[source]

Exception raised by requires() (before calling the decorated function) when a requirement is not met

cltoolkit.features.reqs.inventory(language)[source]

Make sure a language has a precomputed sound inventory.

cltoolkit.features.reqs.graphemes(language)[source]

Make sure a language has segmented forms, i.e. lists of graphemes for each form.

cltoolkit.features.reqs.concepts(language)[source]

Make sure a language has forms linked to concepts, i.e. senses with Concepticon mapping.

cltoolkit.features.reqs.requires(*what)[source]

Decorator to specify requirements of a feature callable.

@requires(graphemes)
def count_tokens(language):
    return 5
cltoolkit.features.reqs.inventory_with_occurrences(language)[source]

Make sure a language has a precomputed sound inventory with occurrence lists per sound.

Phonological features

Miscellaneous phonological features found in typological databases.

class cltoolkit.features.phonology.WithInventory(*args, **kw)[source]

Base class for feature callables requiring access to a phoneme inventory.

class cltoolkit.features.phonology.InventoryQuery(attr)[source]

Compute the length/sizte of some attribute of a sound inventory.

number_of_consonants = InventoryQuery('consonants')
class cltoolkit.features.phonology.YesNoQuery(attr)[source]

Compute whether an inventory has some property.

has_tones = YesNoQuery('tones')
class cltoolkit.features.phonology.Ratio(attr1, attr2)[source]

Computes the ratio between sizes of two properties of an inventory.

class cltoolkit.features.phonology.StartsWithSound(concepts, features, concept_label=None, sound_label=None)[source]

Check if a language has a form for {} starting with {}.

Note

Parametrized instances of this class can be used to check for certain cases of sound symbolism, or geographic / areal trends in languages to have word forms for certain concepts starting in certain words.

See also

sound_match()

mother_with_m = StartsWithSound(["MOTHER"], [["bilabial", "nasal"]], sound_label='[m]')
Parameters
  • concepts (typing.List[str]) –

  • features (typing.List[typing.List[str]]) –

  • concept_label (typing.Optional[str]) –

  • sound_label (typing.Optional[str]) –

cltoolkit.features.phonology.sound_match(sound, features)[source]

Match a sound by a subset of features.

Note

The major idea of this function is to allow for the convenient matching of some sounds by defining them in terms of a part of their features alone. E.g., [m] and its variants can be defined as [“bilabial”, “nasal”], since we do not care about the rest of the features.

cltoolkit.features.phonology.is_voiced(sound)[source]

Check if a sound is voiced or not.

cltoolkit.features.phonology.is_glide(sound)[source]

Check if sound is a glide or a liquid.

cltoolkit.features.phonology.is_implosive(sound)[source]

This groups stops and affricates into a group of sounds.

cltoolkit.features.phonology.stop_like(sound)[source]

This groups stops and affricates into a group of sounds.

cltoolkit.features.phonology.is_uvular(sound)[source]

Check if a sound is uvular or not.

class cltoolkit.features.phonology.PlosiveFricativeVoicing(*args, **kw)[source]
class cltoolkit.features.phonology.HasPtk(*args, **kw)[source]
class cltoolkit.features.phonology.HasUvular(*args, **kw)[source]
class cltoolkit.features.phonology.HasGlottalized(*args, **kw)[source]
class cltoolkit.features.phonology.HasLaterals(*args, **kw)[source]
class cltoolkit.features.phonology.HasEngma(*args, **kw)[source]
class cltoolkit.features.phonology.HasSoundsWithFeature(attr, features)[source]

Does the inventory contain at least one {}.

prenasalized_consonants = phonology.HasSoundsWithFeature("consonants", [["pre-nasalized"]])
class cltoolkit.features.phonology.HasRoundedVowels(*args, **kw)[source]
cltoolkit.features.phonology.syllable_complexity(forms_with_sounds)[source]

Compute the major syllabic patterns for a language.

Note

The computation follows the automated syllabification process described in List (2014) based on sonority. Based on this syllabification, we calculate the number of consonants preceding the syllable nucleus and those following it. For a given syllable, we store the form, the consonantal sounds, and the index of the syllable in the word. These values are returned in the form of two dictionaries, in which the number of sounds is the key.

class cltoolkit.features.phonology.WithSyllableComplexity(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableStructure(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableOnset(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableOffset(*args, **kw)[source]
class cltoolkit.features.phonology.LacksCommonConsonants(*args, **kw)[source]
class cltoolkit.features.phonology.HasUncommonConsonants(*args, **kw)[source]

Lexical features

Miscellaneous features for lexical data.

class cltoolkit.features.lexicon.ConceptComparison(alist, blist, ablist=None, alabel=None, blabel=None)[source]

Virtual base class for features comparing lexical data via concepts.

Parameters
  • alist (typing.List[str]) –

  • blist (typing.List[str]) –

  • ablist (typing.Optional[typing.List[str]]) –

  • alabel (typing.Optional[str]) –

  • blabel (typing.Optional[str]) –

class cltoolkit.features.lexicon.Colexification(*args, **kw)[source]

Computes if two concepts are expressed with the same form in a language (i.e. if they are colexified).

class cltoolkit.features.lexicon.PartialColexification(*args, **kw)[source]

Computes if two concepts are partially colexified, i.e. if a form for the first concept is contained in a form for the second concept.

class cltoolkit.features.lexicon.SharedSubstring(*args, **kw)[source]

Computes if forms for the two concepts share a substring (of length >= 3).

Indices and tables