this post was submitted on 02 Sep 2024
5 points (100.0% liked)

Rust

5999 readers
23 users here now

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

[email protected]

Credits

  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

founded 1 year ago
MODERATORS
 

So I'm trying to parse school's website for some info. I'm trying to get some values using xpath. So I found a html 5 parser and it can't properly parse the first line. Then I figure you it's actually XHTML and not HTML. After quick Google search I found out XHTML can be properly parsed using any XML parser and so I found one and... It can't parse the first line. So I ask LLama3.1 (like a real programmer) why I can't parse the first line with any parser. It explained so nicely that I did not destroy my keyboard when I was told that this document is "XHTML 1.0 Transitional" and it's a mix of HTML 4 and XHTML and can't be parsed with HTML nor XML parser. I hate the guy that invented that so much...

So I can't find a crate to parse XHTML 1.0 transitional? Or a crate to convert xhtml to something else? Any advice?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 2 months ago

Have you tried some tag soup parser? That should work as a last resort even if the ones building a tree structure don't.