University of Essex
Department of Language and Linguistics

Verb-initial grammars: A multilingual/parallel perspective

The Welsh PARGRAM Project
ESRC Project RES-000-23-0505

The Welsh morphological analyser is implemented using the Xerox finite-state tools LEXC and XFST. General orthographic rules are written in XFST, while the LEXC lexicon takes care of morphotactic constraints. We provide an account of initial consonant mutation using XFST and the multiword tokeniser.

For downloads and information on LEXC and XFST, please see http://www.fsmbook.com.

Final Version: The source files will be made available for research use as a tar file. Please contact Louisa Sadler.

Morphology files

Fst files directly called by XLE (file: welsh-std-morphconfig) are marked with an asterisk.

  1. Tokenizer
    • *diacrit-exact-lower.fst and
    • *diacrit-exact-upper.fst both compiled from:
    • diacrit-exact.xfst
    • *diacrit-sloppy.fst compiled from:
    • diacrit-sloppy.xfst
    • *decaps.fst compiled from:
    • decaps.xfst
    • *cymtok-whitespace.fst compiled from:
    • cymtok-whitespace.xfst
    • *cymtok.fst compiled from:
    • cymtok-sandhi.xfst and
    • cymtok-tb.xfst also calling:
      • cymtok-quasivoc.txt
      • cymtok-sandhi-prepI.txt

  2. Morphological analyser proper (for source files see below):
    • *welsh-morph-adj.fst
    • *welsh-morph-adv.fst
    • *welsh-morph-noun.fst
    • *welsh-morph-number.fst
    • *welsh-morph-othercats.fst
    • *welsh-morph-prep.fst
    • *welsh-morph-verb.fst
    • *welsh-morph-verb-bod.fst
    • All in one: welsh-morph-all.fst
    The final of the morphological analyser is available as a binary file

  3. Multiword transducer:
    • *welsh-multiword.fst compiled from:
    • welsh-multiword.xfst

Information on Source files for the morphological analyser files

File types:

  • welsh-script-category.xfst: the central script files that call the relevant lexc, xfst and fst files and compile the binary welsh-morph-category.fst files listed above.
  • welsh-lexc-category.txt: the lexc files (the "lexicons").
  • welsh-morphalt-category.xfst: files dealing with morphological and orthographical alternations; they are usually not called directly but as derived binary files (welsh-morphalt-category.fst).

The following fst files are called by all or almost all scripts:

  • mut.fst compiled from:
  • mut.xfst
  • diffmut.fst compiled from:
  • diffmut.xfst
  • diacrit-exact-upper.fst and
  • diacrit-exact-lower.fst both compiled from:
  • diacrit-exact.xfst
  1. Source files for welsh-morph-adj.fst:
    • welsh-script-adj.xfst
    • welsh-lexc-adj.txt
    • welsh-lexc-adj-override.txt
    • welsh-morphalt-adj.xfst
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  2. Source files for welsh-morph-adv.fst:
    • welsh-script-adv.xfst
    • welsh-lexc-adv.txt
    • welsh-lexc-adv-override.txt
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  3. Source files for welsh-morph-noun.fst
    • welsh-script-noun.xfst
    • welsh-lexc-noun.txt
    • welsh-lexc-noun-override.txt
    • welsh-morphalt-noun-ynLgL.fst and
    • welsh-morphalt-noun-ynLgU.fst both compiled from:
    • welsh-morphalt-noun-ynLg.xfst
    • welsh-morphalt-noun-U.fst compiled from:
    • welsh-morphalt-noun-U.xfst
    • welsh-morphalt-noun.fst compiled from:
    • welsh-morphalt-noun.xfst
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  4. Source files for welsh-morph-number.fst:
    • welsh-script-number.xfst
    • welsh-lexc-number.txt
    • numconv2.xfst
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  5. Source files for welsh-morph-othercats.fst
    • welsh-script-othercats.xfst
    • welsh-lexc-othercats.txt
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  6. Source files for welsh-morph-prep.fst:
    • welsh-script-prep.xfst
    • welsh-lexc-prep.txt
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  7. Source files for welsh-morph-verb.fst:
    • welsh-script-verb.xfst
    • welsh-lexc-verb.txt
    • welsh-lexc-verb-irreg.txt
    • welsh-lexc-verb-override.txt
    • welsh-lexc-verb-lsp.txt
    • welsh-verb-preverb.xfst
    • welsh-morphalt-verb.fst compiled from:
    • welsh-morphalt-verb.xfst
    • welsh-morphalt-verb-U.fst compiled from:
    • welsh-morphalt-verb-U.xfst
    • mut-x-verb.fst compiled from:
    • mut-x-verb.xfst
    • And:
      • mut.fst
      • diffmut.fst
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  8. Source files for welsh-morph-verb-bod.fst:
    • welsh-script-verb-bod.xfst
    • welsh-lexc-verb-bod.txt
    • And:
      • diacrit-exact-upper.fst
      • diacrit-exact-lower.fst
  9. Source files for welsh-morph-all.fst:
    Either:
    • welsh-script-all.xfst
      (runs all welsh-script-category.xfst files and combines the resulting welsh-morph-category.fst files)
    or:
    • welsh-script-all-fst.xfst
      (combines all welsh-morph-category.fst files directly)
Last updated Louisa Sadler 8/6/07