Random Syntax

Well, since I seem to be on something of a roll with this whole random-word-generator thing, I thought I’d write another post about it, detailing my current progress.

I took the algorithm that I used in my previous post and tricked it out with a random-syntax generator. In a nutshell, here’s how the new algorithm works:

  1. Creates a list of phonemes using both consonant-vowel and vowel-vowel pairs.
  2. Adds a very few three-letter phonemes, isolated consonants, and random accented characters for flavor.
  3. Creates a “syntax matrix” with dimensions equal to the size of the phoneme “dictionary,” and populates this matrix with random zeros and ones. This forms the syntax that determines how words may be constructed.
  4. Builds words by adding phonemes to the current word, but only if the new phoneme is allowed (by the syntax) to come after the previous one.

The syntax matrix probably needs some more explanation, so, I’ll explain by example. Here’s an output sample from the algorithm:

[‘MU’, ‘Ô’, ‘TA’, ‘TO’, ‘CU’, ‘VE’, ‘GA’, ‘QO’, ‘PU’, ‘RU’, ‘FIP’, ‘XA’]
[0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1]
[0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]
[1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
[0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1]
[1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1]
[0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1]
[0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0]
[1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1]
[0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0]
[0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0]
VEQOXA
XAÔ
MUVECU
MU
ÔFIPVEFIP
CURU
QOXATAGARU
MUCUTA
CU
TATOÔRU
PU
GACUXA
MUFIP

The list at the very top is the phoneme dictionary. And the two-dimensional array that follows is the syntax matrix. What the matrix does is, as I said before, determine whether the phoneme combination XY is allowed. Please note that XY and YX are, in this case, treated as completely different, and one of these may be allowed while the other is not. For a more concrete example, the first word in the list is VEQOXA. The first two phonemes are VE and QO. VE is the sixth entry in the dictionary (it’s in the fifth position, because Python arrays initialize from zero), and QO is the eighth (seventh position). To determine whether this combination is allowed, the algorithm goes to the sixth row, and then to the eighth entry in that row. Since that entry is a 1, the combination VE-QO is allowed. This is very much like the fact that, in English, we allow the letter combinations ABA and ABR, but not PQK. (Note: the intransitivity of the syntax matrix is actually demonstrated by this word. The combination VE-QO is allowed, but if you take the time to look it up, you will find that QO-VE is not allowed).

I had a lot of trouble getting this fiddly little bastard of an algorithm to work, so there are some peculiarities. For example, since Python doesn’t have anything resembling “goto”, if the randomly-chosen phoneme was not syntactically allowed to be added to the new word, then instead of going to select a different one, the algorithm simply gives up, adds nothing, and starts the whole procedure again. The result is that some of the words are much shorter than they should be. Hopefully, my Python skills will someday improve to the point that I can solve this irritating problem. I’d also like to modify the procedure so that no duplicate phonemes are allowed, since the two phonemes would likely have different syntactical relationships to the other phonemes, and so would build words that appear to violate their own syntax. (Actually, duplicate phonemes might create some interesting little idiosyncrasies that would make the words even closer to real language. So I guess I’ll leave duplicate phonemes alone).

Here’s a larger sample of what the current version of the random word generator is producing, conveniently formatted for your viewing pleasure:

PA, XIXEPO, XEPO, EO, DIÏAIPO, EODIEO, FI, XIAIXI, DIUOPA, Ï, PAW, PADI, EOAI, XI, DIPAXE, FI, EO, WÏ, XE, W, EODIPA, DIUO, W, WXIFIWXE, XIAIXI, UOPA, EO, XEW, XI, UOXIPO, FIPO, PA, AIXIEO, DIUOPA, XIPODIPA, PAPODI, FI, DIÏ, PO, UOWFI, PAWPOAIDI, EOAIDI, XIUO, EO, DIUOXI, ÏPA, FIAI, PAXEDIPAXEDI, ÏAIPO, EOPO, PADIPA, UOW, POXIXIXE, ÏXIFIPODI, UOPA, XIAIPO, UO, XEWÏPAPO, W, XIXI, PADIEOPO, XEEOFIPOXIDIÏPA, DIPA, POAI, PO, DI, XE, DI, EOPO, AI

P.S.: A thousand points to anybody who can pronounce all of these!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: