I’ve been reanalysing a pioneering experiment into radio voices:
“In his book Voice and Personality, [Professor] Pear explained what inspired his experiment. He recounted listening to a radio play over headphones one day in a gloomy room lit only by the glow from his fire. Engrossed in the play, he conjured up in his mind what the protagonists might look like, and he began to wonder whether other listeners did the same.”Now You’re Talking, Trevor Cox
In Pear’s experiment, radio listeners heard nine people reading a short passage from Dickens. They sound like they should be characters from an Agatha Christie whodunnit – they included Detective Sergeant F. R. Williams, Miss Madeleine Rée and the Reverend Victor Dams. The audience filled out a questionnaire in The Radio Times, with some also providing ‘general remarks’. It is these 632 bits of prose that I’ve been analysing using modern text-mining tools. (There is more detail on the experiment in two previous blogs  ). It’s been fascinating to find out what the tools can (and cannot) do, and also to understand more about what the audience thought of the voices.
Main part of the questionnaire used by Pear
One of the challenges of free-text analysis is the time it takes to read, digest and analyse all the responses. Cluster analysis is useful to help speed any detailed exploration. This method groups speakers according to those that attract the most similar words in the responses.
The process is to first identify the most important words being used to describe the speakers. First the text is cleaned up and broken into tokens (words). Typically this involves removing: numbers, most punctuation, short words and common words like “the”. Everything is also converted to lowercase. I also had to hand craft some rules to deal with where English has changed over the last century, for example changing “writing-desk” to “writing desk”. Finally lemmatisation is carried out where inflected forms of the same word are brought together, for example ‘nervous’, ‘nervousness’, ‘nervously’ are analysed as the word ‘nervous’.
Finally a Document Term Matrix (DTM) is formed. This gives the frequency of tokens that occur for each speaker. The table below shows part of the matrix; the full one has seventy-one columns and doesn’t fit on the page easily!
|Miss M. Pear||6||7||19||1||2|
Table 1. The first 5 columns from the document term matrix, giving frequency of each token for each speaker
Applying a hierarchical clustering algorithm then groups together speakers whose frequency of tokens follows a similar pattern. The figure below shows the outcome as a dendrogram. On the right hand side are the speakers, and the lines indicate how they split into different groups. The word(s) attached to each branch (e.g. ‘male’, ‘female’) is the most commonly occurring token in that group relative to terms used for the other group.
The first split on the left is into two groups according to gender i.e. male and female. For each of the speakers, the most common term used in the responses describes the gender of the person. The difference in pitch between male and female voices is an example of sexual dimorphism i.e. a characteristic designed to signal your gender. Sexual dimorphism is strongly signalled by the voice, with female voices being typically an octave higher in pitch than male voices. Consequently, this split into gender groups is to be expected.
The female group then splits according to age, with Miss Marjorie Pear being the only child speaking in the experiment. The male group splits according to how well the passage was read. As noted in a previous blog using sentiment analysis, nervousness was an important differentiator between speaker.
Using cluster analysis is straightforward and quick to use and allows the speakers to be divided into groups with reduced bias from the experimenter. However, for this dataset, it is hard to go much further with the cluster analysis. Furthermore, this is a technique that represents each response as a bag of words and therefore has limitations. In this sort of analysis the ordering of the words are lost, which is important, e.g. “This is good” is not the same as “Is this good”. This can be partly solved by looking at tokens as multiple words, “this-is-good” is one token and “is-this-good” another. Using such an n-gram approach still is limited because the semantics and meanings of language is never fully captured by these methods.
What do you think about cluster analysis? Let me know below
There is more about voice and personality in my book Now You’re Talking. You might also be interested in this earlier blog that did a detailed keyword analysis of the data.