this post was submitted on 05 Nov 2024
123 points (96.9% liked)

Linux

48624 readers
1609 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

The developers of the Manjaro Linux distribution, built on the basis of Arch Linux and aimed at beginners, announced the beginning of testing a new service MDD (Manjaro Data Donor), designed to collect statistics about the system and send it to the external server of the project. The author of the MDD intended to enable telemetry by default (opt-out), but the decision has not yet been approved and, judging by the objections of some developers and users, it is likely that telemetry will be offered as an option requiring prior consent of the user (a request to enable telemetry is proposed to be added to the greeting interface after the first download).

The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

The sent data is stored on the project server in the ClickHouse database and visualized using the Grafana platform. The IP addresses of users are not stored, and the hash from the /etc/machine-id file is used as the system identifier.

Аccording to the code https://github.com/manjaro/mdd/blob/master/mdd.py#L40 sends everything.

you are viewing a single comment's thread
view the rest of the comments
[–] Buffalox 13 points 1 month ago* (last edited 1 month ago) (2 children)

This may be illegal in EU if they don't use opt in. ~~Even then it may be illegal for under 18 year olds to collect MAC addresses and disk serial numbers, as those can potentially be used for identification.~~

The data is anonymized, and the IP is NOT stored. So I'm not sure this violates GDPR?

From the code we can see the machine ID is anonymized, sending only a SHA256 checksum.

def get_hashed_device_id():
    # Read the machine ID
    with open("/etc/machine-id", "r") as f:
        machine_id = f.read().strip()

    # Hash the machine ID using SHA-256 to anonymize it
    hashed_id = hashlib.sha256(machine_id.encode()).digest()

    # Convert the first 16 bytes of the hash to a UUID (version 5 UUID format)
    return str(uuid.UUID(bytes=hashed_id[:16], version=5))

This makes it somewhat a nothingburger IMO.

[–] [email protected] 10 points 1 month ago* (last edited 1 month ago)

That's not anonymous, that's pseudonymous.

What is the point of this? The machine-id already looks to be some unique random number, so you're calculating another unique random-looking number from that, might as well use the original number.

You can't glean any useful information from a unique random-looking number that would help with developing Manjaro. You can't calculate any statistics from that. The only use is tracking.

Edit: And as mentioned in my other comment, reversing the MAC SHA by brute force is trivial, so that one at least (and possibly the other hardware serial numbers they collect) shouldn't even be considered pseudonymous.

[–] ouch 4 points 1 month ago (1 children)

Nah, it's still considered Personal Data under GDPR, because it's possible to connect to natural persons. So GDPR applies. And this is illegal, there is no legal basis for processing this data.

[–] Buffalox 1 points 1 month ago* (last edited 1 month ago) (1 children)

because it’s possible to connect to natural persons.

That's debatable, and is only based on the claim that it's just a 24bit decoding that can be brute forced. I don't know for a fact that it's true that it can be boiled down to 24bit.
I checked my own /etc/machine-id, and the folder doesn't even exist, so what exactly is supposed to be in it IDK. And yes I use Manjaro.

[–] [email protected] 4 points 1 month ago (1 children)

I edited my comment on your other reply and by my estimation, calculating every SHA256 of all MACs ever potentially issued takes less than 89 seconds on an RTX 3090.

I also think MACs are (or should be considered) personally identifiable information, since there is potentially a paper trail back to the person who bought it. Plus MACs are not secret information, it's broadcast on the LAN and for wireless modules over the air in the immediate vicinity (though some systems will randomize wireless MACs for privacy reasons). Privacy-unfriendly software has been known to collect MACs (even from other devices on the network and in the vicinity), so there are already databases connecting MAC addresses with other data.

[–] Buffalox 1 points 1 month ago

calculating every SHA256 of all MACs

Yes but because I don't have the folder it reads myself, I can't see what actually encoded. Are you sure /etc/machine-id is ONLY the MAC address?