Uncovering Alternative Metrics: Data Mining Wikipedia for Evidence of Public Engagement and Impact - Code
This is code and data that accompanies a paper and presentation for VALA2024 Conference.
The approach was to use the category system to narrow down the number of articles and only search one category of interest at a time. We wrote a script that searches through and returns all articles recursively within the category Biotechnology, and subcategories. We can use that list as a basis for the script to download the parsed HTML for each article and search for DOIs within the text. This approach would pick up all DOIs regardless of if they were used in a footnote or as a literature list at the end of the article.
We could then use the DOIs we captured to query the Crossref and DataCite APIs to extract more information about the DOIs in question.