I have been having trouble formulating an important argument in one of my PhD papers, so I’m going to rehearse it here. If anyone could help me to refine it, I’d be eternally grateful if you’d let me know…
Longitudinal data is good stuff if you want to try to understand a cause/effect relationship. For example, causes generally precede effects, and having longitudinal data allows you to check whether your hypothesised relationship runs in the right direction through time. I am interested in whether volunteering ’causes’ political activity. I have a dataset (the NCDS) which allows me to check whether volunteering precedes political activity (like voting, signing petitions and attending public meetings) in time. It does. So far, so good.
There are still problems, though. In survey terms, volunteering is bound to precede political activity. The cohort is asked about volunteering for the first time at age 16 and political activity for the first time at age 23. Therefore, if volunteering and political activity are related at all, it will look as if volunteering is the cause and political activity the effect, even if that’s not really the case. For example, volunteering and political activity could be jointly caused. This is incredibly plausible. Volunteers are generally well-educated and middle-class: political acts are also linked to education and class. Furthermore, one could argue that there is a ‘civic type’: people with a particular personality or socialisation which predisposes them to volunteer and vote. Causal inference, therefore, continues to elude me.
There is hope, however. Longitudinal data like this allows the user to account, to a certain extent, for the effects of character, intelligence and socialisation. If a person’s volunteering is driven by their character, their innate intelligence, or their familial socialisation, those effects are likely to be evident at any given age. That is, if there is a ‘volunteering type’ we would expect to see those people over-represented among the active volunteers in the NCDS at age 16 and age 23 (and indeed at 50, when these questions are asked again). Instead what we see is that volunteers at age 16 are less likely to go on to get an education, less likely to go on to vote and more likely to be from families which are supported by an unskilled worker. The reasons for THAT are a whole other blog post, by the way.
This pattern in the data makes me all the more confident about my findings. Volunteers at age 16 are 40 per cent more likely to volunteer at age 50 than n0n-volunteers (even controlling for an array of potential confounders). This is true even though the 16 year old volunteers have quite different demographic and socio-economic characteristics to the 50 year old volunteers. When I look at the data and see that the relationship persists even though adolescent volunteering in the NCDS looks so unusual, I feel more confident that I may be looking at a causal relationship. Furthermore, when I perform logistic regression using volunteering at age 16 to predict volunteering at age 50, I am controlling not only for volunteering at age 16, but implicitly for all the things which ’caused’ that adolescent volunteering. Therefore, I have controlled for being a ‘volunteering type’, for socialisation by the family, for innate intelligence, and for a whole host of other things which the literature has linked to volunteering. I remain within the shadow of doubt for sure, but I feel a lot more confident than I would have done without the longitudinal data.
And that’s it. Criticism welcome. I am not attempting formal, statistical causal inference, by the way. This is more ‘weight of evidence’. I hope.