this post was submitted on 13 Jan 2024
11 points (100.0% liked)

Programming

16908 readers
295 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
 

Long story short, I want to build a system that reorders some components in a document file (be it a docx or odt, I don't have a hard constraint atm).

So my problem input should be a document file, and I need to be able to approximate the number of pages consumed by this document file, I also need to be able to get the height of individual components (like a single paragraph or a table) to have the data I need to rearrange so I can make the document have less pages.

I don't have a hard constraint on the programming language of the tool either (Python preferred), I prefer not embedding LibreOffice into my system.

Also I'm willing to hear other solutions (maybe my input is not the optimal thing I can use for this problem).

Thanks in advance!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 7 months ago* (last edited 7 months ago)

Ultimately, no, not really, these formats are built to be "render-agnostic", and there's really no way to pre-calculate aspects of what the render will be without actually running it through the rendering engine. Which is, in theory, doable, without having to send the render output to an actual screen or printer, but the followup problem is that all renderers are not created equal. I.E. an engine for rendering a docx that you grab from NuGet or somewhere else is not guaranteed to produce the same output as what Microsoft Word will, not exactly.

If you need accuracy in predicting the rendered-size of various things, you really need to be running the documents through the same renderer that will be used to actually print/draw the documents for the user. If this is Microsoft Office, you can look into Office Interop protocols, which will let you make programmatic calls into the actual Office programs installed on the system, from your program. There ought to be a way to kick off rendering from there.