An academic paper, by Brian Porter and Edouard Machery, has claimed ‘AI-generated poetry is indistinguishable from human-written poetry and is rated more favourably’. This had prompted lots of comment, including this widely-shared article in the Guardian.
The headline (like the paper’s title) is striking. It appears, depending on your viewpoint, either to show that writing poetry is a technologically ‘solved’ problem, that human-poetry is meaningless (or at least, less meaningful than so-called ‘spicy autocomplete’), that the poetry referred to as great is only elevated because of snobbishness, or that there are no longer any secret realms of the human soul into which our new robot overlords cannot encroach. But, like much poetry, I think what is going on here is more complex, strange and specific: revealing about AI and poetry, but also how people use and understand research.
On the findings themselves, rather than their implications, a few things should be noted. People thought the AI-generated poetry was better partly because they thought it was more likely to be written by a human. They thought this because of various preconceptions: about what computers were capable of, of what ‘good poetry’ looked like, of which bits of poetry were ‘hard’ to produce. For example, ‘good’ rhyme (presumably meaning ‘exact’, or, less generously, ‘obvious’ rhyme) was seen as evidence of being written by humans (like ‘I’m a poet / and I didn’t know it’, perhaps). What room this leaves for more adventurous and interesting rhyme is unclear (e.g. that of Paul Muldoon who, Michael Longley quipped, “…could rhyme ‘cat’ with ‘dog’”). These sorts of presuppositions inform how people read and interpret poems. We need to understand them, to understand how people understand culture. But they’re also evident in the research and its responses.
And so, to another famous quip (perhaps by Theodor Reik): ‘history doesn’t repeat itself, but it rhymes’. This story, and the reaction to it, is reminiscent of the experiments a century ago at the heart of I A Richards’ influential book Practical Criticism (1929). This was a foundational text for New Criticism and through it the whole development of the academic study of literature. Richards gave unattributed poems to his (Cambridge undergraduate) students and asked them to write interpretations. The results were that they often preferred bad poems by mediocre poets to good poems by great poets (I simplify deliberately).
Richards took this to indicate that his students’ ability to interpret poems was inadequate and set about developing an approach to understanding poetry based in detailed technical analysis of the words on the page. Response to the AI poetry article often take its technical results at face value, to indicate that the AI poetry is ‘better’, or (as the study authors do) to suggest that there are some attributes of human-written poetry (e.g. complexity, or ‘humanity’) that the approach isn’t able to interpret, that mean it is in fact ‘better’, despite scoring lower on these measures.
Neither of these examples, however, provide an adequately complete picture of writing and reading poetry, treating them as laboratory experiments that can be viewed in isolation. In literary criticism, the challenge to New Criticism is itself old news (placing emphasis on reader response, or historical and theoretical contexts). Something similar would be useful here, too: thinking about what’s going on within the reading process, as well as those wider contexts.
Some categories used (e.g. ‘innovation’) only make sense in relation to an external point of reference (innovation compared to what?). This affects which respondents they are meaningful for, or who can address them (this is an issue that The Insight and Impact Toolkit has had to confront, as well). Since most of the respondent didn’t read poetry: was it like asking for car reviews from non-drivers? Or, given the historic samples used, like asking motorists to review chariots? In any case, an under-discussed finding is that the ‘poor agreement [between responses] suggests that, as expected, participants found the task very difficult, and were at least in part answering randomly’. It’s worth remembering the main result was noise, before focusing on the traces of signal that were found.
It's not that art isn’t suitable for empirical study: it’s a fascinating and growing area (e.g. the work of the Max Plank Institute for Experimental Aesthetics). Our audience research is part of this same tradition. But it requires a sense of when and how the whole isn’t wholly explicable from its parts (i.e. the ideas of ‘gestalt’, or ‘emergence’) as well as when and how understanding is benefitted from greater awareness of differentiation within that whole. There’s a reason we talk a lot about segmentation.
Porter and Machery including rating scales of properties that create quality (and their factor reductions of how those group together is a particularly interesting part of their article). But their handling of them feels out-of-touch with how creative works work. You can’t just ‘dial up’ the quality of a poem by ‘dialling up’ a particular attribute (say, ‘beauty’, or ‘meaningfulness’). Likewise, poems can’t be completely separated from context. Even the words they are made of exist in a network of uses and meanings, and in a particular setting. And poetry – and poetry readers – aren’t monolithic. People read for many reasons and many different things can be poetry, as was discussed at length this last week, during the consultation around the proposed National Poetry Centre in Leeds.
‘How good is it?’ (or ‘overall quality on a seven point scale’) is a destructively reductive question to start with (this is also one of the dangers of ‘prize culture’ in poetry). ‘What sort of thing does it do for me when I read it?’ is a more useful one. Or ‘what do I think and feel in relation to it?’ (and, if you wish, ‘how and why is that happening?’). Poetry exists in different modes, for different types of readers. As Nick Bailey from the National Poetry Centre said, there is too often an absurd tendency to police the question of ‘is it poetry?’ rather than recognising that different ‘poetries’ can coexist (just as you can like or not like jazz, without thinking it’s ‘not music’). But that doesn’t mean it makes sense to judge it all on linear scales (with ‘good rhyme’ taking the place of ‘good [as in exact? obvious?] harmonies’.
Poetry is a difficult case for AI not, as the researchers were testing, because it’s uniquely complex to (plausibly) write. Rather, it’s uniquely complex to read. And not because it’s necessarily ‘difficult’ in the sense of ‘difficult poetry’ (though we might smile at giving a cross-section of people mediaeval, modernist, and other quite tricky poetry and measuring their responses, only to find out they’re in substantial part random…). “As expected, participants found the task difficult”. Indeed. But it’s ‘difficult’ because “people + culture + context”, doesn’t have a simple answer. This is true even if a poem itself is formally simple. Whatever type of poems or poetries they are, they are created in the mind of readers. And people are complex. Poetry can be written by AI, but it can’t be read by it. How people read it is a much more difficult challenge for the researchers, than writing it is for the AI.
As I said before: ‘I simplify deliberately’. That’s how humans think. It’s also one of the things that poetry does (though sometimes in ways that make it more complex). But it’s also part of the process of reasoning, of research, of communication more broadly. As a result, we can see it in operation, for good and ill, in many different aspects of this article: in the researchers, in the readers, and in the responses. This also means we can distil some general lessons, that cut across the production and interpretation of research, poetry and artificial intelligence:
- Qualitative approaches to understanding matter
- So do specifics, of content and method
- Sometimes it’s important to leave the complexity in (or to recognise where and why it’s been deliberately left out)
- It isn’t possible to be purely empirical (theoretical frames, historical context and 'knowing what you’re talking about' are essential)
- It’s very easy to read into things based on presuppositions
- What AI can do is remarkable, and better than many people realise (expectations will likely have to adjust), although the same could be said of poetry and human intelligence too
- To understand what an article, poem or piece of research says, you need to read it, carefully (inc. the footnotes)
- Things as they are / are changed upon the blue guitar.