Metadata>paradata. Discuss.

Today’s S3RI methodology seminar was given by Gabi Durrant and Julia D’Arrigo.  The subject was paradata, which I’m pretty sure is data which describes the process of survey data collection.  It might be data about the person conducting the interviews, or ‘extra’ data collected by them (for example information about the condition of the interviewee’s house or garden), or maybe information about what happened when the interviewer asked whether the householder would take part.  Paradata is a kind of metadata (I’m perfectly willing to be told that I’ve got that the wrong way round, by the way).  Metadata is, I think, data about data while paradata is data about data collection.  Anyway.  Durrant and D’Arrigo have a lovely complicated model predicting householder response at each contact.  It was tough for me to follow but there were a couple of stand-out points. 

Firstly, inference from the model relies on the assumption that, for the first household contact, calling time is independent of household characteristics.  This is because the model makes connections between calling times and the probability of getting cooperation.  The assumption is extremely unlikely to be true – but it’s not obvious to me how much of a problem that is.  To see what I mean, think about blocks of flats.  They have intercoms.  Intercoms are associated with higher refusal rates.  However, most blocks have a ‘service’ or ‘tradesman’ button which allows postal workers to gain access between (say) 8am and 10.30am.  An experienced interviewer might therefore choose to call at blocks of flats early in the morning in order to avoid having to use the intercom.  If people who live in blocks of flats are different from people living in homes with their own front door (as seems likely) that would affect the results.

Secondly, their model accounts for two interesting structures in the data.  Respondents are nested ‘within’ interviewers and contacts are nested ‘within’ households (i.e. some households receive 1 contact, some 4, some 7 etc.)  Not sure what I want to say about that, but it’s interesting to know that one model can account for both multilevel and longitudinal data structures.

This entry was posted in Academic research and tagged , . Bookmark the permalink.