Hannah Rigdon

home  |  GIS  |  musings  |  cartography  
       |  about

Replication and Reproduction of Spatial Analysis using Twitter Data

The analytical techniques used to analyze the data in Wang et al’s analysis include kernel density estimation, text mining, and network analysis. The kernel density estimation compares the coordinates of wildfire tweets to a map of population to create a heat map or map the intensity of wildfire-related Twitter activity (Fig. 4). Text mining was conducted to collect twitter data, clean it up, and pull out important topics. These are presented in Table 3, as the various cluster identifiers with the words that tended to get grouped together. The social network analysis was done using retweets to map networks of connectivity between Twitter users. The results of the network analysis can be seen in the retweet network (Figure 10) as well as Figures 8 and 9, which the indegrees and outdegrees to which a user has been retweeted or has been retweeting others.

The National Academies of Sciences, Engineering and Medicine draw a clear distinction between reproduction and replication, which is important for understanding how it might be possible to build off of the work of this paper. Reproducibility, according to the National Academy, depends on the availability of code and transparency about computational methods. To reproduce a study, a researcher ideally does not have to produce any new code and the computational methods are “transparently and accurately reported” (2019, 45). Replication, on the other hand, is when a similar analysis is conducted to answer the same scientific question but using a new set of data (2019, 46). The authors of this paper were fairly thorough in their methods section, although there was a significant lack of transparency of their computational methods that would make this study difficult to reproduce. For example, the authors did not specify the radius used in their kernel density estimations, which makes it difficult to conduct the same kernel density estimation. They were also relatively vague about their network analysis methods, noting simply that an R package “was implemented” but gave no computationally replicable details as to how (2016, 527). However, there was enough specificity to the different methods of analysis that one could reasonably attempt to answer the same question with their own collected data and relatively similar computational methods, making this a replicable, but not reproducible study.

References:

National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303.

Wang, Z., X. Ye, and M. H. Tsou. 2016. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards 83 (1):523–540.