this post was submitted on 19 Jul 2023
219 points (93.6% liked)

Comradeship // Freechat

263 readers
1 users here now

Talk about whatever, respecting the rules established by Lemmygrad. Failing to comply with the rules will grant you a few warnings, insisting on breaking them will grant you a beautiful shiny banwall.

A community for comrades to chat and talk about whatever doesn't fit other communities

founded 3 years ago
MODERATORS
 

The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 15 points 1 year ago (1 children)

Since you are reading it, what is the ranking in the screenshot based on?

So far, on the arxiv page, no data or source code have been provided alongside the paper. I'd expect jupyter journals, or something like that at least, for reproducibility. Perhaps they will be added later or they are provided in a URL within the paper that I have not yet read.

In any case, the screenshot is of Table 11, and it is found in Appendix D, Domain Analysis:

We examine the differences in the popularity of domains between tankies and their similar ideologies in this section. This analysis will also help us to understand if there are any platforms specifically used by tankies. We first look for the popular domains shared by tankies to have a better understanding of the further results in this section.

Popular Domains Shared By Tankies. We detect 146,078 URLs from 7,049 different domains (including suffixes) tankies share. Table 11 shows top 20 domains shared by tankies after removing top 1,000 globally most visited domains by Majestic [ 74]. From the table, we see online Marxist/Marxist-Leninist hubs (marxist.org and workers.today), American left-wing alternative news sources (thegrayzone.com, peoplesworld.org, and foreignpolicy.com), and the web page of the Communist Party USA (cpusa.org), a Venezuelan alternative news source (telesurenglish.net), a British far-left alternative news site (newworker.org), a Reddit-like Marxist-Leninist platform (lemmygrad.ml), Chinese news outlets (cgtn.com and globaltimes.cn), and Chinese far-left platforms (redsails.org and qiaocollective.com).

Describing foreignpolicy.com as left-wing is an example of miscategorization by the authors, as is calling redsails.org a "Chinese far-left platform." Neither of these are accurate statements, and they undercut trust that the authors are correctly and thoroughly labeling and interpreting their data. Between this and other glaring oversights in Table 12 -- which purports that domains like "redditsave.com," "ko-fi.com," "twimg.com," and "archive.is" are "representative domains of tankies" specifically and supposedly not heavily found in other similar far-left communities (as per the authors' description of the Tf-Idf algorithm and their motivation for its use) -- there is a compelling case that the authors (1) do not themselves possess a sufficient level of understanding of left-wing ideology -- much less Marxist-Leninist ideology -- to label it accurately, and (2) may have been sloppy with their data analysis (though this can't be definitively known without access to the underlying datasets and analytics source code).

Majestic is described on the cited URL as: "The million domains we find with the most referring subnets." Basically, of the 7,049 different domains contained in the 146,078 URLs the authors found in their crawl, remove any that are found in the top 1,000 domains as defined by Majestic. Domains like google.com, facebook.com, reddit.com (whether or not the authors recognize the potential problem with excluding that particular result from the table is unknown at this point; I have not finished reviewing the paper).