This is a webpage representing a project for INFO-664 Programming Cultural Heritage at Pratt Info School in the Fall of 2025 by Alana Maisel.
Data was web-scraped from the The New York Post's Environment and Wildlife sub-sections, utilizing Selenium for web scraping and NLTK for language processing. The result is a cleaned dataset of headlines, article excerpts, and their publish dates. The captured article data spans from the website's re-launch in 2013 to 2025, with over 2,400 articles represented.
The NYPost, owned by Rupert Murdoch, is one of many online-focused media outlets with sensationalist tones towards science coverage as it interacts with politically right-leaning news coverage. The outlet is one of few online news sources not using a paywall model, allowing for wide readership and headline-focused social media dissemination. According to the trade journal Press Gazette, The New York Post was ranked 3rd in paper circulation in the United States in 2023, and the news website consistently ranks in the top ten among visits nationally.
Coverage about the environment and wildlife creates its own distinct reader ecosystem, shaping how readers encounter explicitly political and social content placed alongside it. The basic article data of the "flora and fauna" collected here, separated from their contextual story content, is how many digital readers encounter this information. The content is both humorously absurd, apocalyptic, and targets interests and anxieties of its wide readership. Observationally, the Post generally runs social interest stories as continued coverage, mirroring algorithmic content consumption, with multiple specific stories interacting with topics of editorial interest such as immigration, gender politics, and street safety. The environment and wildlife sections also use this editorial model, with concurrent articles about microplastics, animal disease, and environmental toxicity.
Training an AI model, as done in this project, is one way of experimenting with creative production in collaboration with the article data, made possible by web-scraping. The generation of fictional headlines becomes a way to tap into and parody the paper's distinctive editorial voice and the world- building it performs. By discovering these headlines dissociated from their source, the author of this project hopes you can find ways to creatively engage with these narratives, or discover new insights about language use, patterns over time, or even assessing truth, though representing facts is not the goal of this project.