An Evening of Fun, Testing Peltarion’s “Author Style Predictor” AI

ADDENDUM

I ran some additional tests using text from Leo Tolstoy and Oscar Wilde to see how well the application performed in identifying an author’s own work. The results were not good. Here is a passage cut from “Anna Karenina,” by Leo Tolstoy, translated from Russian, when I fed into the Peltarion AI in-full:

Stepan Arkadyevitch’s eyes twinkled gaily, and he pondered with a smile. “Yes, it was nice, very nice. There was a great deal more that was delightful, only there’s no putting it into words, or even expressing it in one’s thoughts awake.” And noticing a gleam of light peeping in beside one of the serge curtains, he cheerfully dropped his feet over the edge of the sofa, and felt about with them for his slippers, a present on his last birthday, worked for him by his wife on gold-colored morocco.

The AI incorrectly identified this cut-and-paste text from Leo Tolstoy as being the work of Fyodor Dostoyevsky. This is the determination by the same application that claimed the text “a b c d e f g h i” stylistically matched the work of Tolstoy (Tolstoy being the author of “War and Peace”). I’m not sure what to make of this: maybe the novel Anna Karenina wasn’t part of the original “Deep Learning” training set? It’s a surprising shallow lapse if that’s the case. After all, if an AI literary application by a mainstream AI company is unable to properly identify the work of an author it has already analyzed, from text it should have consumed as part of its training process, what does that say about the tech’s predictive abilities in general?

While the AI appears to have marginally better luck with Oscar Wilde in some limited ad hoc tests I ran, maybe the algorithm was confused by Tolstoy because both an English and German version exists on Project Gutenberg? Peltarion says that that only English-language texts were used so that hypothesis is out. Could it be that the AI acquired such an understanding of these authors through its “deep learning” methods that it has discovered stylistic patterns and influences that we shallow-learning humans may be incapable of seeing or understanding? Could it be that the humans are wrong here and the machines are right? Did this AI just shockingly reveal that Leo Tolstoy stole his literary style from Fyodor Dostoyevsky?

A simple experiment I did—testing different lengths of a passage of Oscar Wilde’s “The Picture of Dorian Gray”—provided some clarity, and also highlighted key shortcomings about the accuracy of AI in general. The following are the results from the Peltarion AI, which were run on selected sections of the same paragraph. Again, all of these passages are cut and pasted from the same paragraph in Oscar Wilde’s “The Picture of Dorian Gray” without modification, and the author is Oscar Wilde in every case. Only in the full passage was the author identified correctly:

The girl laughed again. The joy of a caged bird was in her voice.

-Kate Chopin

Her eyes caught the melody and echoed it in radiance, then closed for a moment, as though to hide their secret.

– Kate Chopin

The joy of a caged bird was in her voice. Her eyes caught the melody and echoed it in radiance, then closed for a moment, as though to hide their secret.

– Kate Chopin

The girl laughed again. The joy of a caged bird was in her voice. Her eyes caught the melody and echoed it in radiance, then closed for a moment, as though to hide their secret. When they opened, the mist of a dream had passed across them.

– Oscar Wilde

A quick look at the publicly available training dataset reveals what happened in both the Wilde and Tolstoy cases: the sentence “When they opened, the mist of a dream had passed across them” is included in the training data set and is associated with Oscar Wilde, but the preceding sentences were not. The Tolstoy passage I quoted from above was similarly not in the AI’s training dataset.

All of the Oscar Wilde sentences above not included in the training set were wrongly identified, suggesting that the algorithm is incapable of assessing an author’s “style” in any meaningful manner. In their description of how the demo was assembled Peltarion also disclosed that the full text of the author’s books was not included in the demo for reasons that were not explained (https://peltarion.com/knowledge-center/tutorials/author-style-predictor).

What this inadvertently reveals is that the AI isn’t conducting any kind of sophisticated or intelligent “style” assessment here, but rather is functioning as an opaque word-usage and punctuation-usage matching algorithm, producing matches from only a small database of samples. But again, the actual techniques used by the algorithm in making its determinations are a complete black box, so we don’t really know what the AI is doing or why.

With all this said, it’s safe to say that when the AI claimed that a passage from Leo Tolstoy’s “Anna Karenina” was written in the style of Fyodor Dostoyevsky, the result was not a breathtaking literary discovery: it was simply a bad call.

And while these results are definitely discouraging, better approaches to handling this same kind of literary analysis are publicly available, and are relatively easy to derive and test with a reasonable understanding of the subject material, and without the use of so-called Artificial Intelligence.

Stay tuned…