this post was submitted on 13 Jun 2024
620 points (98.4% liked)
Technology
60311 readers
2930 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A solution would be for an extension to download the entire video 2x and delete the difference. But if you want to watch on 4k you'd need a connection that is pretty fast (although still in the range of what many people already have). However if they find a way to throttle the max speed on the server side for each client based on the quality they are watching, that would kill this possibility. You could block their cookies and throttling by IP on IPv4 would not be a possibility for them, but when everyone is on IPv6 idk.
But also processing the video on the fly to delete the difference in real time would be heavy, though at least I think it is possible to access the GPU with browser extensions via webGL but I am not sure if for HD and 4k that would be realistic for most people.
Usually ads have a significant volume above the content they sorround (which, by the way, is the thing annoys me the most), so you would only need to check audio for that, which is lot less load than processing the video.
Guessing you'd get a lot of false positives that way, but I like the ingenuity.
My kiddo watches stuff on youtube where the person on screen gets suddenly loud which could really mess with detecting ads by changes in volume. Apprently that is a widespread thing too.
A less expensive method could be to retrieve the subtitle twice, or the subtitle from a premium account and check where the time offsets are.
I don't think that would work. It would be trivial for YT to put different ads in different time slots which would leave a differencing engine with no way to tell what was content and what was ad. However that thought gave me another one; the core problem is the ability to differentiate between content and ad. That problem is probably solvable by leveraging the NPU that everyone is so desperate to cram into computers today.
Nearly all of the YT content I watch, and it's a lot, has predictable elements. As examples the host(s) are commonly in frame and when they're not their voices are, their equipment is usually in frame somewhere and often features distinctive markings. Even in the cases where those things aren't true an Ad often stands out because its so different in light, motion, and audio volume.
With those things in mind it should be possible to train software, similar to an LLM, to recognize the difference between content and ad. So an extension could D/L the video, in part or in whole, and then start chewing through it. If you were willing to wait for a whole D/L of the video then it could show you an ad free version, if you wanted to stream and ran out of ad-removed buffer then it could simply stop the stream (or show you something else) until it has more ad-free content to show you.
A great way to improve this would be by sharing the results of the local NPU ad detection. Once an ad is detected and its hash shared then everyone else using the extension can now reliably predict what that ad looks like and remove it from the content stream which would minimize the load on the local NPU. It should also be possible for the YT Premium users to contribute so that the hash of an ad-free stream, perhaps in small time based chunks, could be used to speed up ad identification for everyone else.
It wouldn't be trivial but it's not really new territory either. It's just assembling existing tech in a new way.
I guess saying the difference wasn't quite specific. It works by deleting everything which is not the same between the two versions of the video, all the parts that are the same in the 2 videos are kept, everything else must be an ad. It breaks down if there is the same ad at the same time on both videos.
This assumes the exact same ads will be injected in the same time markers for every viewer, every time. I doubt any of these will be true.
Edit: I got this backwards...