Once you have set up tfsl, you will probably want to begin using it to do stuff with lexemes.
To start, simply
import tfsl
In tfsl there is a construct called the Language
which pairs a language code with a Qid.
Wherever a language is needed in a tfsl script, this Language
is used in that place.
A Language
is selected whenever a language code (in the case of lemmata, sense glosses, form representations,
and monolingual text) or language item (in the case of lexeme languages) is encountered in a lexeme retrieved by tfsl,
and this same Language
is likewise resolved to a language code or Qid at edit time.
A list of languages present among Wikidata labels/descriptions/aliases is provided as tfsl.langs
mapping language codes (after substituting hyphens with underscores and adding a trailing underscore)
to particular items:
tfsl.langs.de_ # maps "de" to "Q188"
tfsl.langs.dag_ # maps "dag" to "Q32238"
tfsl.langs.sms_ # maps "sms" to "Q13271"
Note that the mapping to items in this list typically prefers the language item over an item combining a language and anything else. There are also other changes that a visit to `tfsl/languages.py' and looking for your desired code might inform you of:
tfsl.langs.be_tarask_ # maps "be-tarask" to "Q9091" and NOT "Q2087886"!
tfsl.langs.ms_arab_ # maps "ms-arab" to "Q9237" and NOT "Q107526440"!
You can make your own Language
objects for custom code-item pairs, and you can use them when modifying lexemes,
but in order for these to be automatically resolved the next time a lexeme is retrieved,
you must assign them to an attribute in tfsl.langs_
my_custom_language = tfsl.Language("gsg", "Q33282")
gsg_lexeme = tfsl.L(29237) # will error here as gsg is not part of tfsl.langs
tfsl.langs.gsg_ = my_custom_language
gsg_lexeme = tfsl.L(29237) # will not error here as gsg is now part of tfsl.langs
Retrieval of existing lexemes to modify is as simple as calling tfsl.L with the Lid:
renne_lexeme = tfsl.L(351)
chien_lexeme = tfsl.L('L241')
Once you have retrieved a lexeme, you can explore each of its many parts:
To retrieve a lemma in a given language, simply index into it with that language.
dog_lexeme = tfsl.L("L524153")
dog_lemma_hi = dog_lexeme[tfsl.langs.hi_]
dog_lemma_ur = dog_lexeme[tfsl.langs.ur_]
and dog_lemma_ur
are what are referred to as tfsl.MonolingualText
(see "Value types: MonolingualText" below):
print(dog_lemma_hi.text) # should print "कुत्ता"
print(dog_lemma_hi.language) # should print something like 'Language("hi", "Q11051")'
If you know the text of a lemma but not its language, you can also retrieve the lemma
by providing any MonolingualText
with the text of that lemma:
dog_lemma_fr = tfsl.MonolingualText("chien", tfsl.langs.fr_)
dog_lemma_fr = "chien" @ tfsl.langs.fr_ # does the same thing as the above, with syntax reminiscent of SPARQL
dog_lexeme = tfsl.L("L313030")
dog_lemma_fro = dog_lexeme[dog_lemma_fr]
The lexeme language is a Language
, and the lexeme category is just a Qid string:
print(dog_lexeme.language) # should print something like 'Language("fro", "Q35222")
print(dog_lexeme.category) # should print "Q1084"
To access some statements on a Lexeme
, simply index into it with the desired Pid.
If a valid Pid is provided, then a list of tfsl.Statement
s will be returned
(which may be empty if the property is not present on the lexeme):
tour_lexeme = tfsl.L("L2330")
describers_of_tour = tour_lexeme["P1343"] # should be a list of two statements
origins_of_tour = tour_lexeme["P5191"] # should be a list with one statement
examples_of_tour = tour_lexeme["P5831"] # should be an empty list
if tour_lexeme.haswbstatement("P5831"): # syntax reminiscent of the keyword used in Wikidata searching
print("This lexeme has a usage example!")
if tour_lexeme.haswbstatement("P1343", tfsl.ItemValue("Q1935308")):
print("This lexeme is described by the SAOB!")
Each Statement
consists of a number of parts:
klimatforandring = tfsl.L("L242121")
usage_examples = klimatforandring["P5831"]
first_usage_example = usage_examples[0]
print(first_usage_example.id) # should be something like "L242121$4e8b1dbe-412c-cf34-55be-964249290213"
print(first_usage_example.property) # should be "P5831"
print(first_usage_example.value) # should be a quote, returned as a MonolingualText
print(first_usage_example.rank) # should be something like "Rank.Normal"
first_usage_example_qualifiers = first_usage_example.qualifiers
first_usage_example_references = first_usage_example.references
Any property whose value is underlyingly a string type will have a string value for tfsl.Statement.value
saob_id = klimatforandring["P9963"][0]
print(type(saob_id.value)) # should be "str"
As noted in the Preliminary on Languages above, any combination of text with a language code is represented alongside
a Language
This also includes the values of Statement
s with monolingual text datatype, which pair the string with the Language
in the form of a MonolingualText
first_usage_example_value = first_usage_example.value
print(first_usage_example_value.text) # should print a sentence in Swedish
print(first_usage_example_value.language) # should print something like 'Language("sv", "Q9027")'
Any property whose value is some Wikibase entity (be it an item, property, lexeme, form, or sense)
will have a tfsl.ItemValue
as the value for tfsl.Statement.value
sv_noun_gender = klimatforandring["P5185"][0]
print(sv_noun_gender.id) # should be "Q1305037"
print(sv_noun_gender.type) # should be "item"
"novalue" values are stored as the boolean False
"somevalue" ("unknown value") values are stored as the boolean True
To list qualifiers to the statement, index into it with the Pid of the qualifying property.
If a valid Pid is provided, a list of tfsl.Claim
s will be returned
(which, as with indexing into statements, may be empty if the provided property does not
qualify the statement):
subject_forms_of_example = first_usage_example["P5830"] # should be a list with one Claim
subject_senses_of_example = first_usage_example["P6072"] # ditto
language_styles_of_example = first_usage_example["P6191"] # should be an empty list
Each Claim
, like each Statement
, has a property and a value
(but unlike a Statement
lack a rank, qualifiers, or references themselves):
first_subject_form = subject_forms_of_example[0]
print(first_subject_form.property) # should be "P5830"
print(first_subject_form.value) # should be something like 'ItemValue("L242121-F1")'
The list of references is a list of tfsl.Reference
, each of which may be indexed into
similarly as with qualifiers, with the same behavior:
usage_example_reference = first_usage_example_references[0]
stated_ins_of_reference = usage_example_reference["P248"] # should be a list with one Claim
pages_of_reference = usage_example_reference["P304"] # should be an empty list
To access a Sense on an existing lexeme, index into it with its sid:
first_sense = klimatforandring["S1"] # should return a LexemeSense
first_sense = klimatforandring["L242121-S1"] # ditto
Similarly to indexing Lexeme
s to obtain lemmata, tfsl.LexemeSense
s may be indexed to obtain glosses:
The same statement indexing behavior that applies to Lexeme
s also applies to Sense
To access a Form on an existing lexeme, index into it with its Fid:
nom_sing_indef = klimatforandring["F1"] # should return a LexemeForm
nom_sing_indef = klimatforandring["L242121-F1"] # ditto
Similarly to indexing Lexeme
s to obtain lemmata, tfsl.LexemeForm
s may be indexed to obtain representations:
print(nom_sing_indef[tfsl.langs.sv_]) # should return a MonolingualText
print(nom_sing_indef["klimatförändring" @ langs.en_]) # should return the same thing
The grammatical features on a LexemeForm
are just a set of Qids:
print(nom_sing_indef.features) # should return {"Q131105", "Q110786", "Q53997857"}
The same statement indexing behavior that applies to Lexeme
s also applies to Form
To create a Lexeme
, only the lemma, language, and lexical category are required, to be provided in that order:
newlexeme = tfsl.Lexeme("hello" @ tfsl.langs.en_, tfsl.langs.en_, "Q83034")
If you have a list of statements, forms, or senses to add, you can also optionally add any of those after the other arguments:
newlexeme = tfsl.Lexeme("hello" @ tfsl.langs.en_, tfsl.langs.en_, "Q83034",
statements = statementlist,
forms = formlist,
senses = senselist)
More on the contents of statementlist
, formlist
, and senselist
Given a Pid and an appropriate property value, a statement may be created as follows:
newstatement = tfsl.Statement("P1343", ItemValue("Q464886"))
newstatement2 = tfsl.Statement("P5187", "hello" @ tfsl.langs.en_)
If you want to create a lexeme with those two statements, then they can be
added to a Python list and provided at lexeme creation time with the statements
argument to Lexeme
statementlist = [newstatement, newstatement2]
newlexeme = tfsl.Lexeme("hello" @ tfsl.langs.en_, tfsl.langs.en_, "Q83034",
statements = statementlist)
Alternatively, these statements may be added directly to the newly created lexeme as follows:
newlexeme = newlexeme + newstatement
newlexeme = newlexeme + newstatement2
Given a form representation, a new LexemeForm
may be created as follows:
newform_representation = "hello" @ tfsl.langs.en_
newform = tfsl.LexemeForm([newform_representation])
(The representation is in a list because there may be multiple form representations on a single LexemeForm
To add a new form representation to a lexeme form, it is as simple as
another_newform_rep = "hallo" @ tfsl.langs.en_gb_
newform = newform + another_newform_rep
(Note that if there were already a British English form representation on newform
, that representation would be overwritten.)
If features are desired to be added, they can be added to a Python list and provided as the features
argument to LexemeForm
newform_features = ["Q2339337", "Q77768943"]
newform = tfsl.LexemeForm(newform_representation, features=newform_features2)
To add a new feature to a lexeme form proceeds similarly to adding a representation:
newform = newform + "Q901711"
Much of the content of "Creating statements" above also applies here; LexemeForm
also takes a statements=
If you want to create a lexeme with this form, then it can be
added to a Python list and provided at lexeme creation time with the forms
argument to Lexeme
formlist = [newform]
newlexeme = tfsl.Lexeme("hello" @ tfsl.langs.en_, tfsl.langs.en_, "Q83034",
forms = formlist)
Alternatively, to add the new form to the new lexeme is as simple as
newlexeme = newlexeme + newform
Given a sense gloss, a new LexemeSense
may be created as follows:
newsense_gloss = "greeting" @ tfsl.langs.en_
newsense = tfsl.LexemeSense([newsense_gloss])
(The gloss is in a list because there may be multiple sense glosses on a single LexemeSense
To add a new sense gloss to a lexeme sense, it is as simple as
another_newsense_gloss = "hälsning" @ tfsl.langs.sv_
newsense = newsense + another_newsense_gloss
(Note that if there were already a Swedish sense gloss on newsense
, that gloss would be overwritten.)
Much of the content of "Creating statements" above also applies here; LexemeSense
also takes a statements=
If you want to create a lexeme with this sense, then it can be
added to a Python list and provided at lexeme creation time with the senses
argument to Lexeme
senselist = [newsense]
newlexeme = tfsl.Lexeme("hello" @ tfsl.langs.en_, tfsl.langs.en_, "Q83034",
senses = senselist)
Alternatively, to add the new sense to the new lexeme is as simple as
newlexeme = newlexeme + newsense
If you want to edit lexemes, you must first ensure that you are logged in:
my_username = 'Mahirtwofivesix'
current_session = tfsl.WikibaseSession(my_username)
This will prompt you for a password when it is run. You may optionally provide this password as the second argument to WikibaseSession:
my_username = 'Mahirtwofivesix'
my_password = '' # your password here
current_session = tfsl.WikibaseSession(my_username, my_password)
To submit edits to a particular lexeme,
or to create an entirely new one,
one can use WikibaseSession.push
and provide
the Lexeme
in question and an optional edit summary:
current_session.push(renne_lexeme, "lexème modifiée")
current_session.push(newlexeme, "nouveau lexème")