Scott Alexander in Slate Star Codex:
Gwern has answered my prayers and taught GPT-2 poetry.
GPT-2 is the language processing system that OpenAI announced a few weeks ago. They are keeping the full version secret, but have released a smaller prototype version. Gwern retrained it on the Gutenberg Poetry Corpus, a 117 MB collection of pre-1923 English poetry, to create a specialized poetry AI.
I previously tested the out-of-the-box version of GPT-2 and couldn’t make it understand rhyme and meter. I wrongly assumed this was a fundamental limitation: “obviously something that has never heard sound can’t derive these complex rhythms just from meaningless strings of letters.” I was wrong; it just didn’t have enough training data. Gwern’s retrained version gets both of these right, and more too. For example:
Thou know’st how Menoetiades the swift
Was dragged, of Hector and the fierce compeers
And Phrygian warriors. So, we will dispatch
Your bodies, then, yourselves to burn the ships
In sacrifice; with torches and with bells
To burn them, and with oxen to replace
Your gallant friends for ever. But I wish
That no man living has so long endured
The onset of his foes, as I have power
To burn or storm; for mighty Hector erst
Was slain, and now returns his safe return
This is all perfect iambic pentameter. I know AP English students who can’t write iambic pentameter as competently as this.