Docker

966 readers

1 users here now

founded 2 years ago

MODERATORS

[email protected]

[Question] Docker and Databases: Why choose one over another? Does it matter? (lemmy.ml)

submitted 4 months ago by [email protected] to c/[email protected]

5 comments fedilink hide all child comments

Hi everyone !

Intro

Was a long ride since 3 years ago I started my first docker container. Learned a lot from how to build my custom image with a Dockerfile, loading my own configurations files into the container, getting along with docker-compose, traefik and YAML syntax... and and and !

However while tinkering with vaultwarden's config and changing to postgresSQL there's something that's really bugging me...

Questions

How do you/devs choose which database to use for your/their application? Are there any specific things to take into account before choosing one over another?

Does consistency in database containers makes sense? I mean, changing all my containers to ONLY postgres (or mariaDB whatever)?

Does it make sense to update the database image regularly? Or is the application bound to a specific version and will break after any update?

Can I switch between one over another even if you/devs choose to use e.g. MariaDB ? Or is it baked/hardcoded into the application image and switching to another database requires extra programming skills?

Maybe not directly related to databases but that one is also bugging me for some time now:

What's redis role into all of this? I can't the hell of me understand what is does and how it's linked between the application and database. I know it's supposed to give faster access to resources, but If I remember correctly, while playing around with Nextcloud, the redis container logs were dead silent, It seemed very "useless" or not active from my perspective. I'm always wondering "Humm redis... what are you doing here?".

Thanks :)

top 5 comments

sorted by: hot top controversial new old

[–] [email protected] 6 points 4 months ago* (last edited 4 months ago) (1 children)

You can't decide what database to use for a container, its developers choose an engine (and sometimes a version) and code for that specific combination.

This is why it's best to keep database containers separate even if multiple apps use Postgres, for example, because it might not be the same version of Postgres, or may be configured differently.

There isn't that much overhead by doing so, and in return you usually benefit from the developer helping you install their database more easily.

Redis is a key-value database, it's optimized for looking up pairs of data very fast. Postgres, Maria, SQLite are relational databases, they're good for arranging data in tables where you can sort and filter it and set up relations between tables.

There are also many other types of databases which are optimized for various data arrangements and use cases. To decide which to use when you're developing your own app you have to think about your data very carefully to figure out your needs.

But the reason Postgres is used so often is that it's an excellent engine and can probably do almost anything you want so if you're ever in doubt, pick Postgres.

[–] TechLich 1 points 4 months ago

I'd add to this to say that redis as a key-value store often sits alongside a relational database like postgres etc. to act as a cache for it.

Basically, requests to be sent to the relational db (like postgres) get turned into a key and the results stored as a value in redis. Then when the same request comes through again, it can pull the results quickly out of the key-value store without having to search postgres by running a long SQL query again. There's a few different caching strategies to keep things up to date or have the cached data expire regularly, etc. but that's the gist of it.

Important to note that not all applications need something like that and not all queries would even benefit from it (postgres is pretty fast and can even do that kind of thing itself) but if there's a lot of users running the same slow query over and over, caching the results can help immensely.

[–] [email protected] 3 points 4 months ago

Hi! Boy, I can't wait to see the other comments.

My database choice starts with any dependencies. Of any core complement has a preference (or requirement) for a particular DB, that's the one I use. DB admin is probably my least favorite thing in the world, and having to manage multiple can put me entirely off a project. For this same reason, if I have a completely free choice, I'll choose a NoSQL DB, as they're almost always easier to admin (with the almost singular exception of embedded DBs like Sqlite). But I'm also a little biased against SQL because (a) it's a domain specific language with some pitfalls not hard to hit with sufficient complexity; (b) each DB implementation has dialect specifics which are irritating when you have to bounce between different DBs; and (c) I'm simply neither confident in my knowledge, not have interest in maintaining expertise. I acknowledge the strengths of a SQL DB; it's simply not where I want to spend my time.
Yes, DB containers are significant. If you only use PostgreSQL, and you're for some reason running multiple instances, those containers will share the same images. This will only save you disk space, but with people basing their images on, e.g. Ubuntu, container image sizes can be significant - GBs of space. However, if space isn't a concern, this may not matter.
It's probably likely that your DB upgrades will be driven by your proglang DB interface library requirements. Personally, I wouldn't upgrade the DB unless there's a CVE or some other external demand, such as when the library requires it. And - yes - it can easily break your application, like any dependency. Rails is particularly horrible about tight DB version coupling, at least on PGSQL.
If you're careful about the SQL you craft; or the ORM or abstraction library supports it; and all your tooling (upgrade/rollback/DB initialization) supports it, then maybe you can swap out DBs with little trouble. The data migration might be the hardest part, and the older your application gets, the more fraught swapping DB backbends will become. I'd suggest that by doing this, you're exposing yourself to a number of possible essentially runtime bugs from incompatibility or simply differences in how DBs function. You may be confident in data integrity in case of programming bugs with one DB, only to discover difficult to track down bugs because the DB handles errors differently. You might have to change libraries to use a different DB, introducing more variables and potential runtime bugs.

You can add another abstraction layer to your DB interface, and write a bunch of tests for it; if I really thought I'd ever need to change DB backbends, that's what I'd do.

Redis is a very simply DB for very simple data. It's a persistent key/value cache. It's not meant to fulfill the role of a relational or structured data database. It's meant to be for fast lookups of simple data. It has a trivial query language. It's a hash map.

[–] [email protected] 1 points 4 months ago

I say go with Maria or with postgres for all use cases in the beginning. You can even start with sqlite. For most of the use cases you don't need the scaling and speed of redis, mongodb etc.

If you hit a performance wall with Maria or postgres, you should scale up and optimize before switching to more specialized databases.

If you use an ORM (object relational mapper like Prisma), it should be okay to switch database engines but you don't want to do that frequently.

My rule of thumb is getting it done first.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago)

These days, I just use postgres for my projects.
It's rare that it doesn't do what I need, or extensions don't provide the functionality. Postgres just feels like cheating, to be honest.

As for flavour, it's up to you.
You can start with an official image. If it is missing features, you can always just patch on top of their docker image or dockerfile.
There are projects that build additional features in, or automatic backups, or streaming replication with automatic failover, or connection pooling, or built in web management, etc

Most times, the database is hard coded.
Some projects will use an ORM that supports multiple databases (database agnostic).
Some projects will only use basic SQL features so can theoretically work with any SQL database, some projects will use extended database features of their selected database so are more closely tied to that database.

With version, again, some features get depreciated. Established databases try to stay stable, and project try and use databases sensibly. Why use hacky behaviour when dealing with the raw data?!
Most databases will have an LTS version, so stick to that and update regularly.

As for redis, it's a cache.
If "top 10 files" is a regular query, instead of hitting the database for that, the application can cache the result, and the application can query redis for the value. When a new file is added, the cache entry for "top 10 files" can be invalidated/deleted. The next time "top 10 files" is requested by a user, the application will "miss" the cache (because the entry has been invalidated), query the database, then cache the result.
Redis has many more features and many more uses, but is commonly used for caching. It's is a NoSQL database, supports pub/sub, can be distributed, all sorts of cool stuff. At the point you need redis, you will understand why you need redis (or nosql, or pub/sub).

For my projects, I just use a database per project or even per service (depending on interconnectedness).
If it's for personal use, it's nice to not worry about destroying other personal stuff by messing up database stuff.
If it's for others, it's data isolation without much thought.

But I've never done anything at extremely large scales.
Last big project was 5k concurrent, and I ended up using Firebase for it due to a bunch of specific requirements