this post was submitted on 12 Feb 2025
12 points (100.0% liked)

Linux

2181 readers
26 users here now

founded 2 years ago
MODERATORS
 

I was trying to do that but I noticed ls | grep searchterm just searches the book TITLES for searchterm. Is this possible, to search the text of ebooks?

top 13 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 6 hours ago

Sounds like a good time to mention that "Little Brother" by Cory Doctorow is available in GNU Info format (usually used for manpages).

[–] sylver_dragon 3 points 8 hours ago

It's going to be different for different file formats. For example, something like epub is going to be hard because the format is really just a zip file with a specific internal file structure. So, it's not really the .epub file you want to grep, but one of the files within that zip file you want to grep through. EBooks stored as PDFs could be a bit easier, as they are a monolithic file format with text often (though not always) stored just as plain text. However, the text streams can be encrypted and/or compressed (FlateDecode); so, there is no guarantee of seeing plain text.

I'm sure there are more formats, but I think you get the idea, how you would do a string search comes down to the actual file format. And some are not going to be easily greppable. It's not impossible, just not straight forward.

[–] [email protected] 6 points 14 hours ago (1 children)

ls lists files, if you pipe it to grep it will print matching lines with file names. Universally you can't grep through ebook content, but you can do it with epub, probably other zipped text formats using zipgrep or just unzipthem and grep unarchived files.

[–] [email protected] 2 points 11 hours ago
[–] [email protected] 6 points 14 hours ago (1 children)
[–] [email protected] 3 points 11 hours ago (1 children)

This looks pretty cool, thanks!

[–] [email protected] 2 points 6 hours ago

Glad to help!

[–] JASN_DE 4 points 14 hours ago (1 children)

Yeah, that's to be expected with ls as it only lists the folder contents. Which format do you have?

[–] [email protected] 2 points 11 hours ago

epub, mobi and pdf

[–] [email protected] 3 points 14 hours ago* (last edited 14 hours ago) (1 children)
[–] [email protected] 2 points 14 hours ago (1 children)

You can't grep zip archives directly.

[–] [email protected] 3 points 13 hours ago (1 children)

Ripgrep-all has that capability.

[–] [email protected] 3 points 11 hours ago

Good to know.