this post was submitted on 29 Mar 2024
14 points (88.9% liked)

SQL

473 readers
1 users here now

Related Fediverse communities:

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
 

Is there a programming language specifically designed for interacting with SQL databases that avoids the need for Object-Relational Mappers (ORMs) to solve impedance mismatch from the start?

If such a language exists, would it be a viable alternative to PHP or Go for a web backend project?

top 43 comments
sorted by: hot top controversial new old
[–] [email protected] 40 points 9 months ago (1 children)
[–] [email protected] 3 points 9 months ago

Came here to say this

[–] [email protected] 14 points 9 months ago (4 children)

ORMs are one of the worst things ever created IMO. Sure they’re great to turn structured data storage into objects and methods that developers are used to, but every single one of them scales for crap and I can’t tell you the number of projects we have to go back and fix to be straight up raw SQL once it starts growing and becomes a bottleneck.

(I’ll get off my soapbox now)

[–] pixxelkick 7 points 9 months ago (2 children)

I'm curious under what condition an ORM impacts scaling at all.

ORMs merely generate a SQL query to run on the DB, then serialize the data it gets back, how your application svales should be unimpacted by using an ORM.

I'd assume if somehow it's impeding scaling then the query itself was structured poorly, or that particular ORM was very poor.

A good ORM should be able to generate the sql query in microseconds, the actual DB query time should be typically like 99% of the response time for your API, which the ORM has no bearing on.

[–] [email protected] 4 points 9 months ago (1 children)

The SQL queries I’ve seen almost every ORM create are highly inefficient compared to a good query designed by hand. That’s where I’ve always seen the delay get introduced. You end up spending lots of cycles on the query so the delay looks like it’s within the DB but when replaced by a custom query the bottleneck goes away. Quite often a quick fix is replacing the query with a store procedure or view and letting the devs work off of that.

[–] pixxelkick 6 points 9 months ago (2 children)

The SQL queries I’ve seen almost every ORM create are highly inefficient compared to a good query designed by hand.

I have to just respectfully disagree. Perhaps you had an actual database engineer on your team who was in charge of the sql (in which case 100% yes absolutely use sql!)

But typically backend devs aren't database engineers and they have no idea how to compose a good sql query, let alone how to optimize it or test its execution plan.

I've seen way way more absolute clusterfuck garbage queries that take way too long to run that were hacked together by BE devs.

Quite often a quick fix is replacing the query with a store procedure or view and letting the devs work off of that.

Views are fucking awful imo. It's yet another entire layer of abstraction that deeply obfuscates what is actually running and/or happening.

The entire principle of shit outside my codebase that I can inspect with my LSP causing side effects to my logic is just a nightmare to maintain.

The moment you can have the exact same application behave differently purely because different stuff was or was not put on the db it was pointed at, it's an absolute cluster fuck to maintain.

Stored Procedures only should be reached for under one circumstance imo, and that's when you need to use recursion on your DB.

IE if you have perhaps a parent/child self FKd table to create a hierarchy tree with unknown depth... You'd want to traverse it recursively which likely would want a stored proc.

But aside from that, I just can't get behind breaking up the backend into effectively having 2 distinct layers of truth to its behavior.

I want all my codebase in one place, and in one language, under one language, through a single LSP.

[–] [email protected] -1 points 9 months ago (1 children)

I have had Database Engineers and done it myself. If you run any ORM created SQL queries through a profiler and look at the execution plan you’ll see it’s an absolute mess. That’s why I said it doesn’t scale. Sure it’s good for small things but I’m working on projects that have millions or rows and multiple joins. At that scale it just starts to fall apart. Having good raw queries will beat out an ORM every time at scale and that’s why I hate them. You want to use it for a small quick project, go for it. You’re trying to work at enterprise scale, get a DBA to make you actual queries.

[–] pixxelkick 7 points 9 months ago

If you run any ORM created SQL queries through a profiler and look at the execution plan you’ll see it’s an absolute mess.

I've never had this experience with Entity Framework, full stop.

It sounds like either the devs were abusing the ORM and using it wrong to make it generate garbage, or, you were just using some very poorly written ORM.

[–] [email protected] -2 points 9 months ago

But typically backend devs aren’t database engineers and they have no idea how to compose a good sql query, let alone how to optimize it or test its execution plan.

Should the BE Dev be touching the data if they don't, you know, know how to work with the data? No: SQL Developers or Data Engineers should create data they can access and use. The design of the database is entirely separate from the design of the application. They do not need to be related, and usually should not.

I’ve seen way way more absolute clusterfuck garbage queries that take way too long to run that were hacked together by BE devs.

Again, should the BE Devs be touching the data, then? No. If they don't know what they're doing (which is writing queries to handle data), then they shouldn't be messing with the data. The Data team should be messing with the data, and serving it to the BE Devs.

Views are fucking awful imo. It’s yet another entire layer of abstraction that deeply obfuscates what is actually running and/or happening.

You must not be a SQL Developer or Data Engineer.

Stored Procedures only should be reached for under one circumstance imo, and that’s when you need to use recursion on your DB.

You're definitely not a SQL Developer or Data Engineer.

IE if you have perhaps a parent/child self FKd table to create a hierarchy tree with unknown depth… You’d want to traverse it recursively which likely would want a stored proc.

I had to read this several times to make sure I understood, because I would absolutely and immediately fire any Data Engineer working for me who developed something like this, and I would shun and or quit the company or project team that insisted on developing an application or process that required it, especially if they did so without consulting their Data team.

I want all my codebase in one place, and in one language, under one language, through a single LSP.

This is completely unrealistic. It speaks to a lack of industry experience and is down-right selfish. No hiring manager will ever take you seriously if you say you only use one language. You need to know several to get by. And that's because each one has a different purpose. HTML renders web content, CSS makes it pretty, JavaScript makes it think, PHP makes it explode, Go makes it go, JSON helps it share data, SQL manipulates the data, Shell/Bash updates systems, etc. Some of these are markup languages, some are stylers, some compute and run functions, some explain data, some manipulate data, some give system instructions. They all have a different purpose, are used in a different layer of the stack, and require a different expert (with the exception of some Full-Stack Wizards) to implement.

Saying you want one language is the most bonkers thing I've read on the internet this month. Don't travel the world; you're gonna have a bad time.

[–] [email protected] 1 points 9 months ago (1 children)

The classic example is the N+1 query pattern, where the number of generated queries is linear in the number of rows returned by the first query.

[–] pixxelkick 6 points 9 months ago

This is not a problem for a modern ORM, JOINs are supported by most ORMs I've worked with for many years.

Var cars = await db.Cars
    .Include(c => c.Wheels)
    ToListAsync();

foreach (var car in cars)
{
    Console.WriteLine(car.Model);
    foreach (var wheel in car.Wheels)
    {
        Console.WriteLine(wheel.Id);
    }
}

This will get all the cars and their associated wheels in 1 single query by performing a JOIN operation using Entity Framework Core, assuming there's a FK for Wheel to Car.

[–] [email protected] 7 points 9 months ago

It's just a performance to development time trade off. Clearly the product was successful enough to demand coming back to improve the performance. Lots of software fails before it even reaches that point.

[–] [email protected] 3 points 9 months ago (1 children)

ORMs should be made illegal. They make you learn sql then how to translate that to the ORM.

[–] [email protected] 2 points 9 months ago

The absolutely worst thing about them is they claim to be database engine agnostic. Surey the most basic stuff maybe, but make your queries a tad bit advanced and it stops working on different database engine. Also, how often do you switch to a database engine that is so much different from your existing one? I hate using Doctrine, stupid query builders make me develop twice as slow. On the other hand, Eloquent is kind of nice and has an easy way to write raw SQL queries.

[–] [email protected] 2 points 9 months ago

Argh, your comment is such a pet peeve of mine, especially since my SQL knowledge is kinda arse (it's #1 of my todo list to learn): "just use raw SQL" is a terrible answer, because newbies now still don't know how to not use an ORM (and building classes is what they tend to know, so using a wrapper like Django or SQL Alchemy is ez pz).

How do I learn to use raw SQL, as a way to not use ORMs. Yes, learning SQL is step 1, but what is step 2? How am I going to do migrations, without having to manually run stuff, because manual work is faulty work. How am I going to track changes in my model, over time?

Sorry if I sound frustrated, because I am.

[–] pixxelkick 10 points 9 months ago* (last edited 9 months ago) (2 children)

No matter how you tackle this, your front end likely communicates in json and your database in sql.

At the crux of it, your backend has the job of translating between the two without openly exposing the database to the front end (unless security truly doesn't matter for your app)

There's no easy way to get around the fact you simply just have to write logic to mediate between those two languages.

The best way I use to avoid mismatch problems between BE<->DB is I use Code First Entity Framework Core as my ORM, letting my EF Core spec act as the source if truth fir my db schema via automated EF migrations.

This means the only way you get a mismatch is due to merge conflicts not being resolved properly if 2 devs both mutate the db schema at the same time.

C#'s Linq is also the closest first class API I have seen that very closely mimics sql.

I genuinely find Linq queries on Entity Framework easier to write and read than sql.

[–] [email protected] 2 points 9 months ago

Linq

LINQ is remarkable.

[–] [email protected] 2 points 9 months ago (1 children)

Yeah Linq is truly unique in programming languages. The fact that I can write a where clause how I would in normal code and it just translates it into SQL is so much nicer than some DSL for filtering

[–] pixxelkick 1 points 9 months ago

100%, the extremely 1:1 way that c# translates into sql is prolly the closest to what OP wants, I've tried a variety of ORMs and EF Core us hands down the best I've used.

The fact it doubles as a DB Schema manager and migration engine is just icing on the cake. All my database related needs in one spot.

[–] [email protected] 6 points 9 months ago (2 children)

Have you considered raw queries (properly parametrized and escaped and all that)? You can’t beat the speed of getting exactly what you need with no overhead.

What you’re describing is basically a fancy ORM.

[–] [email protected] 1 points 9 months ago (1 children)

properly parametrized and escaped and all that

I'm not sure if what I use is proper enough in your sense. So, can you elaborate more?

[–] [email protected] 4 points 9 months ago (1 children)
[–] [email protected] 1 points 9 months ago* (last edited 9 months ago)

Yes, I've used this.

[–] pixxelkick 0 points 9 months ago (3 children)

My #1 issue with raw sql is its just absolutely a nightmare to maintain.

I simply just can't easily, at a glance, do something as simple as "give me the list if every single chunk of code that touches this column on this table", which is like, 80% of my start points for debugging an error showing up on our backend.

"We sometimes get NULL being set on this column that should no longer be NULL if (other column) is getting set, can you investigate how that us happening?"

If you have an application that uses raw sql, simply just step 1 of "find all backend code that touches that column" is already 100x more effort than it should be, and that's even on a well maintained project.

If the sql is even slightly poorly maintained (and since you are tasking BE (some language other than sql) devs with maintaining SQL, it very often is very poorly maintained, often just shoved as raw magic strings in the middle of their code, so.etimes even generated dynamically.

At which point its just a fucking nightmare to figure out what the fuck is writing to that column.

With an ORM, the issue suddenly becomes as easy as clicking the "find references" button on the field for that column and, boom, all bits of code that touch that field in any way are now listed put for you, ez.

[–] [email protected] 1 points 9 months ago

If you’re at the point where you need the performance boost from raw SQL over an ORM, you have solutions for these problems such as a well-maintained, centralized interface or store for SQL.

OP wasn’t asking for best principles, OP was asking for a language that replaces ORMs. That’s SQL directly in your code.

[–] [email protected] 1 points 9 months ago (1 children)

You can still do that.

For example, you'd still write classes for your tables:

    public class Users
    {
        public int Id { get; set; }
    }

and then you'd just do

   var query = $"select * from {(nameof(Users))} where {(nameof(Users.Id))} = 10;";

That let's you write raw sql about as close as it gets, while still having some degree of type-safety. You could drop a query like that into Dapper, and you're pretty close to just using raw sql.

[–] pixxelkick 1 points 9 months ago (1 children)

I don't see why I'd do that over

db.Users
    .Where(u => u.UserId == 10)
    .ToListAsync();

Which will produce pretty much the exact same sql under the hood but be 100x easier to read, maintain, and debug.

[–] [email protected] 1 points 9 months ago (1 children)

I don’t see why I’d do that

Because just Dapper will perform a lot better executing raw sql queries than EF having to go through an entire expression tree builder.

Anyway, I wasn't saying that that example is a better way than doing it with EF, I was just going over your points where you mentioned that with raw SQL it's just all unreferenced magic strings with no references to tables or columns. And that you can't find where anything is used.

So that's just to explain - if you write your sql inside code in the poorest possible way - yea, you're gonna have a poor experience. But if you want to write raw sql instead of using an ORM, it's pretty easy to negate all those downsides about not having references

[–] pixxelkick 1 points 9 months ago

Because just Dapper will perform a lot better executing raw sql queries than EF having to go through an entire expression tree builder.

I'd like to see some benchmarks on truly how much this difference matters when running on the cloud.

I expect latency alone between the App<->Db will dwarf whatever microseconds your raw sql would save that it's hard to distinguish from the chaos of latency variance.

[–] [email protected] 1 points 9 months ago (1 children)

There are lots of ways to find out what code touches a column. For example, if the code is deployed as stored procedures, you can easily query the text of all stored procedures for references to that column. If it's not deployed that way (maybe in a Git repo somewhere), it's still possible to search that text for the references.

But the problem you describe wouldn't be present if you had good documentation. If developers (front end, back end, and data alike) were able to create documentation that detailed what their code does, and you maintained a knowledge base or data governance platform (like Collibra, though even a wiki would do), you could simple click on the field name and immediately see every article or code reference that uses it as one of their attributes.

Good documentation is all I'm saying. It just usually doesn't exist because the bean counters don't prioritize time to create it, and the developers commonly don't want to (though they'll complain about the lack of it later) or aren't trained to do it effectively.

[–] pixxelkick 2 points 9 months ago (1 children)

you can easily query the text of all stored procedure

Yeah considering that:

  1. Means I have to switch to an entire different context
  2. Lookup the exact syntax to do it right
  3. This also runs the possibility of false positive hitting on anything else that has the same word in it (I've seen it happen, it's not uncommon to have the same column name on multiple tables as an example)
  4. And then I need to parse that output
  5. And that still doesn't get me what I need because that just gets me the stored procs, now I need to find the backend code that calls those stored procs so that's still the same issue lol

if you had good documentation

You are asking backend devs that are specialized in (BE language that isn't SQL) to maintain documentation on SQL code.

Also, wikis or etc are even worse to try and search on, I'd argue this solution will be even worse than just greping the codebase.

because the bean counters don’t prioritize time to create it

Doesn't matter, because you know how I do it with my ORM?

I hit F12 and that's it, I get a brought up list of all code that touches that column, and no false positives, in the same IDE + LSP I use to do my backend code

  1. No need to swap to (some other program/context)
  2. No false positives
  3. Literally 1 hotkey
  4. Leverages the already running LSP that has this data loaded so the results are instant
  5. And no layer of abstraction for "okay I found B calls A but I still need to find the C that calls B"

ORMs take moments to find the exact BE code that matters.

SQL takes minutes to actually find wtf calls what calls what.

If you have BE code that calls a stored proc that calls a stored proc that calls a stored proc that runs against a view of a view of a table (I've seen this sort of shit on very old long maintained large codebases a few times) it can take you hours just to work out the exact chain of what calls what to figure out how a table got to be a specific way at some point.

There is no way to maintain that sort of pile of code easily, you have to spend a tonne of extra time writing documentation just to even approach "not a total nightmare at least..."

No. Thanks.

[–] [email protected] 1 points 9 months ago (1 children)

Yeah considering that: (list of complaints)

It's really not that hard. I can do it in under 30 seconds from memory because I am a data dude. If you're not, that's no big deal; just find one who can help you with it. Projects don't have to be a single person; they can be composed of multiple individuals, each with different specialties. If you want to work alone, learn all the specialties so you can do it, too, instead of whining about how hard it is.

You are asking backend devs that are specialized in (BE language that isn’t SQL) to maintain documentation on SQL code. (plus more complaints)

I'm not asking anyone to do anything. I'm saying that if there was good documentation, we'd all have an easier time. You can't deny that. Is it gonna be easy to create and maintain all that documentation? Not necessarily. Will it make our lives easier down the road if we do, though? Yeah, probably.

And by the way, as a SQL Dev, I create documentation on my SQL deployments. I don't expect the BEs to do it because, as you stated, they don't know it as well as I do. I created it for them so they can have an understanding of the working parts they don't know like the back of their hand, and I create it for me so I can explain it later because I don't have every line of code I've ever written memorized and I might not be the next one who needs to work on it.

Doesn’t matter, because you know how I do it with my ORM?

I'll admit I haven't used a lot of ORMs because I'm old-school. If it has all the functionality you describe, that's great! My point was that you were quick to disparage the SQL development process and it's nuances. Not to mention the fact that the data design is, itself, often nuanced and detailed beyond just want a look-up tool can tell you. Data experts like myself exist so we can explain what's going on, not just pain a picture of it. You can see what picture the puzzle makes, but you need someone who knows how all the pieces go together, don't you?

If you have BE code that calls a stored proc that calls a stored proc that calls a stored proc that runs against a view of a view of a table (I’ve seen this sort of shit on very old long maintained large codebases a few times) it can take you hours just to work out the exact chain of what calls what to figure out how a table got to be a specific way at some point.

You're absolutely right. This is bad design. It should not be done this way. However, if you have an intelligent and creative Data Expert, you'll get an easily designed solution that you can use over and over and requires minimal effort to maintain and update as the needs change. It comes down to who built it and what skills they had. Categorically believing that all data delivery solutions are terrible because of this one kind of experience is a logical fallacy.

No. Thanks.

Look, that's fine. If you don't wanna touch it, don't touch it. Let data experts touch it. I've designed database systems for IBM, Nike, and Amazon AWS, and I was singularly capable of doing so because of my background. None of the other hundreds of developers on the dozens of Agile teams were able to do that work because they all had other specialties and other jobs and just needed data to be delivered to them quickly, efficiently, and in a manner that was easy to digest and utilize. That was my job as the Data Expert. If you don't have one but you need one, you need to become one. And if you were one, you wouldn't be disparaging the trade so much.

[–] pixxelkick 1 points 9 months ago* (last edited 9 months ago)

you’ll get an easily designed solution that you can use over and over and requires minimal effort to maintain and update as the needs change

The moment it involves me having to leave the context of my workspace, go to some other workspace, abd then try and connect the dots between the two without there being any existing solutions to aid that, it's never going to be easy.

That's just a cold hard fact. There's no fluid LSP mechanism to context switch between the SQL and my backend code. Full stop.

If I have BE code that invokes a stored proc, I can't just see the definition of "wtf does that stored proc do?" From within my workspace.

I definitely can't just hit a button at this time to just switch to its definition.

Also, the fact what my DB thinks the stored proc is vs what the code says it does is always a big wrench in the gears. I despise the entire concept of "well the code defines it as this but my local db is out of date"

The concept of literally switching git branches is all it takes for my codebase to say one thing and my db to say another.

You can't reconcile that without tonnes of extra work.

And add in the fact that this problem can layer up with stored prics calling stored procs, DB schema changing...

I just am not interested in a stack where I have to maintain 2 entirely separate chains of distinct sources of truth, especially when the latter doesn't even maintain with Version Control.

just get a data expert

Like I said if you actually have a DB Engineer, use them.

But this may surprise you but, most projects just don't have one, nor can they afford one.

On smaller companies looking to cut costs, having a BE dev is necessary, but if the need for a DB Engineer can be replaced by a BE Dev just using an ORM for now, then I think it's pretty straightforward which of the two roles will be hired and the other passed on.

Until you get to a fairly large scale, actually having dedicated DB engineers simply just isn't the part of most companies strategies.

Do I wish I had another entire person who could just handle it for me?

Sure.

But it's not gonna happen if I'm not working on FAANG scale projects, I also wish I had a million dollars right now, but that's just not reality.

[–] [email protected] 3 points 9 months ago (1 children)

Everyone else has more experience than I, and I am not sure these are exactly the kinds of answers you are looking for...but the two things I have thought is using something like PL/SQL and stored procedures, so much of your backend logic is removed from the server and set into the database itself. Not exactly what you are looking for I think, and it has problems of its own.

Second, Prolog is a great query language (from what I am told) and capable of running a server. TerminusDB runs their server in prolog, and also postgres has a prolog implementation. It would be interesting to play with these things, but they may not exactly be what you are looking for.

[–] [email protected] 1 points 9 months ago (1 children)

Stored procedure and Datalog are not what I'm looking for. However, I should consider them seriously, since they may be more practical.

[–] [email protected] 1 points 9 months ago (1 children)

I am not as familiar with RDBMs internals, but you could also build your server in the database. Right now, I am building a server client of sorts with Oxigraph. I have a store object that I am manipulating directly with rust code. It is an option. However its not going to be very flexible, and it does complicate the sanitization issues.

Also, prolog is a complete language, very capable of running the server. I don't know what kind of architecture you are thinking of and having the distinction between datalog on the database and prolog in the server might be problematic. Also, I may be projecting a little. I wish I could be using prolog. But alas.

[–] [email protected] 2 points 9 months ago

I don’t know what kind of architecture you are thinking of and having the distinction between datalog on the database and prolog in the server might be problematic.

I thought about Datomic and Clojure.

[–] piotrm 3 points 9 months ago (1 children)

There is also links (https://links-lang.org/). Designed for web but has nice SQL abstractions. It is mostly an academic language though.

[–] [email protected] 1 points 9 months ago* (last edited 9 months ago)

b but has nice

Thank you. This is exactly what I'm looking for.

link

[–] [email protected] 3 points 9 months ago (1 children)

Don’t, you’ll invoke all the DBAs from the 90s who all seem to hang out here.

[–] [email protected] 1 points 9 months ago

I did. I also coded in PL/pgSQL.

[–] [email protected] 2 points 9 months ago

As pixxelkick already said, you most likely need some kind of mapping anyway between backend sql and frontend json. If you would have a language designed for interacting with sql databases it would probably suck for everything else.

In java with jooq and its generator it works pretty well, actually. You get the database types as java types and have rather safe queries. Type support breaks down for large and complicated queries, though.

[–] [email protected] 1 points 9 months ago

It really depends on where you set the limit on what ORM is, JOOQ is kind of a thing you're looking for.