Features

A feature is some aspect of language, e.g. the size of its phoneme inventory.

In cltoolkit, a Feature is an object bundling some metadata with a python callable accepting a cltoolkit.models.Language instance as its sole argument, and returning the value computed for this language.

A FeatureCollection of predefined features is available in FEATURES.

class cltoolkit.features.collection.Feature(id, name, function, type=None, note=None, categories=None, requires=None)[source]
Variables
  • idstr

  • namestr

  • functioncallable

See also

get_callable()

Parameters

function (typing.Union[str, dict, typing.Callable]) –

class cltoolkit.features.collection.FeatureCollection(items, **kw)[source]

A collection of Feature instances.

dump(path)[source]

Dump feature specifications as JSON file.

classmethod load(path)[source]

Load feature specifications from a JSON file (e.g. as created with FeatureCollection.dump)

cltoolkit.features.collection.get_callable(s)[source]

A “feature function” can be specified in 3 ways:

  • as Python callable object

  • as string of dot-separated names, where the part up to the last dot is taken as Python module spec, and the last name as symbol to be looked up in this module

  • as dict with keys class, args, kwargs, where class is interpreted as above, and args and kwargs are passed into the imported class to initialize an instance, the __call__ method of which will be used as “feature function”.

Parameters

s (typing.Union[str, dict, typing.Callable]) –

Return type

typing.Callable

Requirements

Features may have different requirements regarding the kind of data needed to perform the computation. These requirements can be expressed (and enforced) by decorating the callable (function or method) using the cltoolkit.features.requires() decorator, parametrized with the appropriate callables from the cltoolkit.features.reqs module - or any other callable accepting a cltoolkit.models.Language instance as argument, returning True if the requirement is met.

exception cltoolkit.features.reqs.MissingRequirement[source]

Exception raised by requires() (before calling the decorated function) when a requirement is not met

cltoolkit.features.reqs.inventory(language)[source]

Make sure a language has a precomputed sound inventory.

cltoolkit.features.reqs.graphemes(language)[source]

Make sure a language has segmented forms, i.e. lists of graphemes for each form.

cltoolkit.features.reqs.concepts(language)[source]

Make sure a language has forms linked to concepts, i.e. senses with Concepticon mapping.

cltoolkit.features.reqs.requires(*what)[source]

Decorator to specify requirements of a feature callable.

@requires(graphemes)
def count_tokens(language):
    return 5
cltoolkit.features.reqs.inventory_with_occurrences(language)[source]

Make sure a language has a precomputed sound inventory with occurrence lists per sound.

Phonological features

Miscellaneous phonological features found in typological databases.

class cltoolkit.features.phonology.WithInventory(*args, **kw)[source]

Base class for feature callables requiring access to a phoneme inventory.

class cltoolkit.features.phonology.InventoryQuery(attr)[source]

Compute the length/sizte of some attribute of a sound inventory.

number_of_consonants = InventoryQuery('consonants')
class cltoolkit.features.phonology.YesNoQuery(attr)[source]

Compute whether an inventory has some property.

has_tones = YesNoQuery('tones')
class cltoolkit.features.phonology.Ratio(attr1, attr2)[source]

Computes the ratio between sizes of two properties of an inventory.

class cltoolkit.features.phonology.StartsWithSound(concepts, features, concept_label=None, sound_label=None)[source]

Check if a language has a form for {} starting with {}.

Note

Parametrized instances of this class can be used to check for certain cases of sound symbolism, or geographic / areal trends in languages to have word forms for certain concepts starting in certain words.

See also

sound_match()

mother_with_m = StartsWithSound(["MOTHER"], [["bilabial", "nasal"]], sound_label='[m]')
Parameters
  • concepts (typing.List[str]) –

  • features (typing.List[typing.List[str]]) –

  • concept_label (typing.Optional[str]) –

  • sound_label (typing.Optional[str]) –

cltoolkit.features.phonology.sound_match(sound, features)[source]

Match a sound by a subset of features.

Note

The major idea of this function is to allow for the convenient matching of some sounds by defining them in terms of a part of their features alone. E.g., [m] and its variants can be defined as [“bilabial”, “nasal”], since we do not care about the rest of the features.

cltoolkit.features.phonology.is_voiced(sound)[source]

Check if a sound is voiced or not.

cltoolkit.features.phonology.is_glide(sound)[source]

Check if sound is a glide or a liquid.

cltoolkit.features.phonology.is_implosive(sound)[source]

This groups stops and affricates into a group of sounds.

cltoolkit.features.phonology.stop_like(sound)[source]

This groups stops and affricates into a group of sounds.

cltoolkit.features.phonology.is_uvular(sound)[source]

Check if a sound is uvular or not.

class cltoolkit.features.phonology.PlosiveFricativeVoicing(*args, **kw)[source]
class cltoolkit.features.phonology.HasPtk(*args, **kw)[source]
class cltoolkit.features.phonology.HasUvular(*args, **kw)[source]
class cltoolkit.features.phonology.HasGlottalized(*args, **kw)[source]
class cltoolkit.features.phonology.HasLaterals(*args, **kw)[source]
class cltoolkit.features.phonology.HasEngma(*args, **kw)[source]
class cltoolkit.features.phonology.HasSoundsWithFeature(attr, features)[source]

Does the inventory contain at least one {}.

prenasalized_consonants = phonology.HasSoundsWithFeature("consonants", [["pre-nasalized"]])
class cltoolkit.features.phonology.HasRoundedVowels(*args, **kw)[source]
cltoolkit.features.phonology.syllable_complexity(forms_with_sounds)[source]

Compute the major syllabic patterns for a language.

Note

The computation follows the automated syllabification process described in List (2014) based on sonority. Based on this syllabification, we calculate the number of consonants preceding the syllable nucleus and those following it. For a given syllable, we store the form, the consonantal sounds, and the index of the syllable in the word. These values are returned in the form of two dictionaries, in which the number of sounds is the key.

class cltoolkit.features.phonology.WithSyllableComplexity(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableStructure(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableOnset(*args, **kw)[source]
class cltoolkit.features.phonology.SyllableOffset(*args, **kw)[source]
class cltoolkit.features.phonology.LacksCommonConsonants(*args, **kw)[source]
class cltoolkit.features.phonology.HasUncommonConsonants(*args, **kw)[source]

Lexical features

Miscellaneous features for lexical data.

class cltoolkit.features.lexicon.ConceptComparison(alist, blist, ablist=None, alabel=None, blabel=None)[source]

Virtual base class for features comparing lexical data via concepts.

Parameters
  • alist (typing.List[str]) –

  • blist (typing.List[str]) –

  • ablist (typing.Optional[typing.List[str]]) –

  • alabel (typing.Optional[str]) –

  • blabel (typing.Optional[str]) –

class cltoolkit.features.lexicon.Colexification(*args, **kw)[source]

Computes if two concepts are expressed with the same form in a language (i.e. if they are colexified).

class cltoolkit.features.lexicon.PartialColexification(*args, **kw)[source]

Computes if two concepts are partially colexified, i.e. if a form for the first concept is contained in a form for the second concept.

class cltoolkit.features.lexicon.SharedSubstring(*args, **kw)[source]

Computes if forms for the two concepts share a substring (of length >= 3).