Random Syntax

Well, since I seem to be on something of a roll with this whole random-word-generator thing, I thought I’d write another post about it, detailing my current progress.

I took the algorithm that I used in my previous post and tricked it out with a random-syntax generator. In a nutshell, here’s how the new algorithm works:

  1. Creates a list of phonemes using both consonant-vowel and vowel-vowel pairs.
  2. Adds a very few three-letter phonemes, isolated consonants, and random accented characters for flavor.
  3. Creates a “syntax matrix” with dimensions equal to the size of the phoneme “dictionary,” and populates this matrix with random zeros and ones. This forms the syntax that determines how words may be constructed.
  4. Builds words by adding phonemes to the current word, but only if the new phoneme is allowed (by the syntax) to come after the previous one.

The syntax matrix probably needs some more explanation, so, I’ll explain by example. Here’s an output sample from the algorithm:

[‘MU’, ‘Ô’, ‘TA’, ‘TO’, ‘CU’, ‘VE’, ‘GA’, ‘QO’, ‘PU’, ‘RU’, ‘FIP’, ‘XA’]
[0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1]
[0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]
[1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
[0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1]
[1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1]
[0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1]
[0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0]
[1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1]
[0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0]
[0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0]
VEQOXA
XAÔ
MUVECU
MU
ÔFIPVEFIP
CURU
QOXATAGARU
MUCUTA
CU
TATOÔRU
PU
GACUXA
MUFIP

The list at the very top is the phoneme dictionary. And the two-dimensional array that follows is the syntax matrix. What the matrix does is, as I said before, determine whether the phoneme combination XY is allowed. Please note that XY and YX are, in this case, treated as completely different, and one of these may be allowed while the other is not. For a more concrete example, the first word in the list is VEQOXA. The first two phonemes are VE and QO. VE is the sixth entry in the dictionary (it’s in the fifth position, because Python arrays initialize from zero), and QO is the eighth (seventh position). To determine whether this combination is allowed, the algorithm goes to the sixth row, and then to the eighth entry in that row. Since that entry is a 1, the combination VE-QO is allowed. This is very much like the fact that, in English, we allow the letter combinations ABA and ABR, but not PQK. (Note: the intransitivity of the syntax matrix is actually demonstrated by this word. The combination VE-QO is allowed, but if you take the time to look it up, you will find that QO-VE is not allowed).

I had a lot of trouble getting this fiddly little bastard of an algorithm to work, so there are some peculiarities. For example, since Python doesn’t have anything resembling “goto”, if the randomly-chosen phoneme was not syntactically allowed to be added to the new word, then instead of going to select a different one, the algorithm simply gives up, adds nothing, and starts the whole procedure again. The result is that some of the words are much shorter than they should be. Hopefully, my Python skills will someday improve to the point that I can solve this irritating problem. I’d also like to modify the procedure so that no duplicate phonemes are allowed, since the two phonemes would likely have different syntactical relationships to the other phonemes, and so would build words that appear to violate their own syntax. (Actually, duplicate phonemes might create some interesting little idiosyncrasies that would make the words even closer to real language. So I guess I’ll leave duplicate phonemes alone).

Here’s a larger sample of what the current version of the random word generator is producing, conveniently formatted for your viewing pleasure:

PA, XIXEPO, XEPO, EO, DIÏAIPO, EODIEO, FI, XIAIXI, DIUOPA, Ï, PAW, PADI, EOAI, XI, DIPAXE, FI, EO, WÏ, XE, W, EODIPA, DIUO, W, WXIFIWXE, XIAIXI, UOPA, EO, XEW, XI, UOXIPO, FIPO, PA, AIXIEO, DIUOPA, XIPODIPA, PAPODI, FI, DIÏ, PO, UOWFI, PAWPOAIDI, EOAIDI, XIUO, EO, DIUOXI, ÏPA, FIAI, PAXEDIPAXEDI, ÏAIPO, EOPO, PADIPA, UOW, POXIXIXE, ÏXIFIPODI, UOPA, XIAIPO, UO, XEWÏPAPO, W, XIXI, PADIEOPO, XEEOFIPOXIDIÏPA, DIPA, POAI, PO, DI, XE, DI, EOPO, AI

P.S.: A thousand points to anybody who can pronounce all of these!

More Random Words

Yesterday, I wrote a post about generating completely random names/words from a character set. Well, out of boredom, I’ve refined my algorithm, and, somewhat to my surprise, it now produces names/words which are not only interestingly alien, but actually phonetically consistent:

SEMAREMA
SALISEMA
HUTEÉMA
MATEMO
REMOLIRENA
REÉNANARE
SAHUMAÉ
ÉSEMOTERE
TETETE
LILI
MALINASASA
SEHURELI
SEHULITE
NAÉSALIMO
HUÉHU
REMO
LILINARE
HUÉ
RELIHUSASE
TEHUMOHULI
REÉMO
LISAMOMAMA
TELIMOMA
SEMASASARE

What I did was to modify the original algorithm so that, rather than just sticking together random letters cushioned by a vowel every other letter, it creates a list of possible two-letter phonemes (in addition to an occasional sprinkling of extra characters), and builds the words from those phonemes. This is good in that it produces pseudo-consistency in the words I generate (it’s not real consistency like you find in language, since there are no rules (not yet, anyway) for how phonemes can and cannot fit together). The only downsides are that: 1) the generator now very rarely produces three-letter words, and 2) the generator never produces inanely amusing things like “FUKER” and “CUJO'” from the previous list.

If I ever optimize it to a degree with which I’m satisfied, I’ll be sure to post the Python source code.

P.S.: Sorry about the lack of real content lately. The holiday season’s made me sluggishly overfed and rather lazy.

Messing With the Mind: Linguistic Experiments

I’ve always been fascinated by enormous impact that language has on the mind. I’ve discovered that a change in writing style can, temporarily, produce an enormous change in mood, attitudes, and even perception for the writer (I’m certainly not the first to discover this, the Internet is ripe with other examples). So, I thought it might be informative (and amusing) to make myself a guinea pig for some linguistic experiments.

  1. “Question your data”: If the thing-I-call-myself performs the action-I-have-labeled-writing in a manner such that the-thing-I-call-myself makes no assumptions about the items-I-call-facts, then perhaps the thing-I-call-myself will begin to doubt these things-called-facts, or perhaps enter the state-I-call-openmindedness.
  2. “Efface the self”: Write without ever referring to “yourself.” Do not use the words “I”, “my,” “our,” or any other such words that imply that the writer has a “self.” It will very rapidly be discovered that one who adopts such a writing style will begin to feel very strange, and to lose their sense of the thing they formerly called “themselves.” It’s a very Buddhist way to write.
  3. “What nouns?”: In reality-ing, us-process discover that thing-processes are impermanent, and thus probably do not deserve such constant, stable label-entitydecays as noun-namings. Although writing in this manner-labeling can be rather confusing and disorienting, it-entitydecays is at least an interesting exercise.
  4. “The direct approach”: Do not use Adjectives. Capitalize Nouns. Do not clutter the Sentence with Words. Replace Adjectives with Verbs. Minimize Sentences. Lose Mind.
  5. “Do like the Germans”: Germanpeople are unafraid of wordcombinations. They willingly wordcombine separate rootwords unfearfully. Although admittedly the practicalresults of this in the Englishlanguage are somewhat confusingdisorienting, it makes an interestingexperiment.
  6. “Don’t be specific”: Another Buddhist experiment in language. Never refer to a specific “object.” All is one. All is all.
  7. “Be way, way too specific”: Use as many descriptive adjectives as your descriptive task requires. Be as mechanically and clinically precise as the current descriptive task warrants. Be coldly and exactly logical in your writing, whether descriptive or otherwise. Leave no room for linguistic ambiguities to enter into your descriptions. Although this particular method may be somewhat difficult to read quickly and easily, it at least leaves no room for errors to be inserted.

Try it! You might be surprised at just how dizzy, disoriented, dissociated, discombobulated, or deharmonized you can get.

English and I Don’t Get Along

I may have mentioned this before, but I have a problem with the English language. Ever since I was a child in elementary school, I’ve had in the back of my mind a list of contradictions and problems with the language. I was always harshly criticized by my friends and teachers when I attempted to fit the irregular verbs into the regular framework, and as a result, I was forcibly taught the language properly. But my discontent at its irregularities, contradictions, and problems remained, and to this day, they still annoy me.

These thoughts were forced back to the forefront when I began taking German in high school. So much of German made sense, all fitting into the established grammatical rules, that I began to see English the way non-native speakers see it: an overly complicated and contrived bundle of words and makeshift, jury-rigged rules. And having seen this in English, I began to see it in German as well. German may be much more sensible than English, but it, too, has its irregular verbs, verbs that don’t conjugate properly, for whatever reason. So my search for a less-confusing language continued.

I then learned of Esperanto, and after perusing some introductory material on the Internet, began an attempt to teach it to myself. This attempt turned into a whole series, each one punctuated by my losing interest for a few months, forgetting what I’d learned, and then going back and having to re-do the online course.

At some point during all this, I learned of Lojban. Lojban is now, in my mind, the best artificial language on the planet. It may be the best overall. It is, from the material I’ve read and from what little experience I’ve had with it, completely unambiguous, logical, and sensible. The only downside is that, after many years of English, my brain is apparently quite averse to a language that makes perfect sense. I simply have trouble wrapping my head around it. That (coupled, probably, with my utter impatience) has made Lojban the most difficult language I’ve yet tried to learn. But, if nothing distracts me, perhaps I’ll be able to give it another go.

I have just one more thing to say. To the English language: Curse you English, curse you.