Microblog Memes

6124 readers

1289 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

Please put at least one word relevant to the post in the post title.
Be nice.
No advertising, brand promotion or guerilla marketing.
Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 2 years ago

MODERATORS

ReadyUser31

aeronmelon

1486

Or they go to adtech (lemmy.world)

submitted 10 months ago by nifty to c/microblogmemes

204 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 8 points 10 months ago* (last edited 10 months ago) (1 children)

Huh? Image ai to semantic formating, then consumption is trivial now

[–] [email protected] -3 points 10 months ago* (last edited 10 months ago) (2 children)

Could you give me an example that uses live feeds of video data, or feeds the output to another system? As far as I'm aware (I could be very wrong! Not an expert), the only things that come close to that are things like OCR systems and character recognition. Describing in machine-readable actionable terms what's happening in an image isn't a thing, as far as I know.

[–] [email protected] 8 points 10 months ago* (last edited 10 months ago) (1 children)

No live video no, that didn't seem the topic

But if you had the horsepower, I don't think it's impossible based on what I've worked with. It's just about snipping and distributing the images, from a bottleneck standpoint

[–] [email protected] -2 points 10 months ago* (last edited 10 months ago)

No live videos

Well, that'd be a prerequisite to a transformer model making decisions for a ship scuttling robot, hence why I brought it up.

[–] FooBarrington 3 points 10 months ago

Describing in machine-readable actionable terms what's happening in an image isn't a thing, as far as I know.

It is. That's actually the basis of multimodal transformers - they have a shared embedding space for multiple modes of data (e.g. text and images). If you encode data and take those embeddings, you suddenly have a vector describing the contents of your input.