this post was submitted on 15 Jun 2023
5 points (69.2% liked)

FREEMEDIAHECKYEAH

1809 readers
1 users here now

๐Ÿฟ ๐Ÿ“บ ๐ŸŽต ๐ŸŽฎ ๐Ÿ“— ๐Ÿ“ฑ


๐Ÿดโ€โ˜ ๏ธ Wiki / ๐Ÿ’ฌ Chat


Rules

1. Please be kind and helpful to one another.

2. No racism, sexism, ableism, homophobia, transphobia, spam.

3. Linking to piracy sites is fine, but please keep links directly to pirated content in DMs.

founded 1 year ago
MODERATORS
 

How do you scrape stuff, I tried wget but it didn't work. A login + a button press is required to see anything. I can do this manually if required. There are many different slides and videos though.

top 7 comments
sorted by: hot top controversial new old
[โ€“] [email protected] 10 points 1 year ago (1 children)

If you login in a browser, it'll most likely give you a "session cookie" that you should be able to see in the developer tools. (If you're using Firefox's developer tools, it'd be under the "storage" section.) The name of the cookie will generally have the word "session" in the name. After logging in, that cookie identifies you to the server, letting the server know that "this particular request is from CucumberSalad" (or whatever your user is named on that service.) Wget probably hasn't been working because the requests from wget don't include that cookie like the requests from your browser do.

(Just looking at my developer tools while using Lemmy, it seems like the Lemmy web ui doesn't use session cookies but rather a JSON web token in a cookie named "jwt", but I think that cookie would suffice if I was trying to scrape the Lemmy web ui.)

Once you have the proper cookie name and value, you can have wget send the cookie to the server with each request by adding the flag --header 'Cookie: <cookie name>=<cookie value>' (but replace the values in angle brackets. Example: --header 'Cookie: JSESSIONID=ksdjflasjdfaskdjfaosidhwe'.)

Also, if you can provide more info as to what you're trying to scrape, folks can probably help more. Hopefully what I've given you is enough to get you going, but it's possible there might be more hurtles to jump to get it working.

[โ€“] [email protected] 5 points 1 year ago

Lovely comment tgankdfor tge help

[โ€“] [email protected] 4 points 1 year ago (1 children)

What exactly are you trying to get?

[โ€“] [email protected] 1 points 1 year ago

A course like thing, with slides there are vidros but i dont care much about those. The url incrments depending on the slide but there are mini questions which stop the user from skipping ahead without them being answered.

[โ€“] [email protected] 2 points 1 year ago (1 children)

Is a link really necessary?

[โ€“] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

No it's optional like including an image

load more comments
view more: next โ€บ