armchair_progamer

joined 1 year ago
MODERATOR OF
 

Abstract:

The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers).

This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored.

In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization

Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the general phenomenon of duality.

Background:

Expression problem (wikipedia)

Defunctionalization (wikipedia)

Codata in action, or how to connect Functional Programming and Object Oriented Programming (Javier Casas)

 

Several months ago I gave a talk at work about Hindley-Milner type inference. When I agreed to give the talk I didn't know much about the subject, so I learned about it. And now I'm writing about it, based on the contents of my talk but more fleshed out and hopefully better explained.

I call this a reckless introduction, because my main source is wikipedia. A bunch of people on the internet have collectively attempted to synthesise a technical subject. I've read their synthesis, and now I'm trying to re-synthesise it, without particularly putting in the effort to check my own understanding. I'm not going to argue that this is a good idea. Let's just roll with it.

 

In this post, I will talk about the first version of the Intermediate Representation I designed for Kunai Static Analyzer, this is a dalvik analysis library that I wrote as a project for my PhD, also as a way of learning about the Dalvik file format and improving my programming skills.

Kunai source (Github)

Kunai paper

Shuriken (Kunai successor)

Dalvik is a discountinued VM for Android

 

Pre-Scheme is an interesting Scheme dialect which is being ported to modern systems. The language, why its being ported, and the porting effort are described in the linked post.

As described in the Pre-Scheme paper, the language offers a unique combination of features:

  • Scheme syntax, with full support for macros, and a compatibility library to run Pre-Scheme code in a Scheme interpreter. The compiler is also implemented in Scheme, enabling both interactive development and compile-time evaluation.
  • A static type system based on Hindley/Milner type reconstruction, as used in the ML family of languages (eg. Standard ML, OCaml, Haskell). Pre-Scheme supports parametric polymorphism, and has nascent support for algebraic data types and pattern matching, which are recently gaining popularity in mainstream languages.
  • An optimizing compiler targeting C, allowing for efficient native code generation and portable low-level machine access. C remains the common interface language for operating system facilities, and compatibility at this level is essential for modern systems languages.

Due to the restrictions of static typing and the C runtime model, Pre-Scheme does not (currently) support many of Scheme's high-level features, such as garbage collection, universal tail-call optimization, heap-allocated runtime closures, first-class continuations, runtime type checks, heterogenous lists, and the full numeric tower. Even with these limitations, Pre-Scheme enables a programming style that is familiar to Scheme programmers and more expressive than writing directly in C.

Ironically, Pre-Scheme's original purpose was to implement the interpreter for another Scheme dialect, Scheme 48 (Wikipedia). But the lower-level Pre-Scheme is now getting its own attention, in part due to the popularity of Rust and Zig. Pre-Scheme has the portability and speed of C (or at least close to it), but like Rust and Haskell it also has a static type system with ADTs and parametric polymorphism; and being a LISP dialect, like most other dialects it has powerful meta-programming and a REPL.

 

GitHub

From README:

Lady Deirdre is a framework for incremental programming language compilers, interpreters, and source code analyzers.

This framework helps you develop a hybrid program that acts as a language compiler or interpreter and as a language server for a code editor's language extension. It offers the necessary components to create an in-memory representation of language files, including the source code, their lexis and syntax, and the semantic model of the entire codebase. These components are specifically designed to keep the in-memory representation in sync with file changes as the codebase continuously evolves in real time.

Key Features

  • Parser Generator Using Macros.
  • Hand-Written Parsers.
  • Error Resilience.
  • Semantics Analysis Framework.
  • Incremental Compilation.
  • Parallel Computations.
  • Web-Assembly Compatibility.
  • Source Code Formatters.
  • Annotated Snippets.
  • Self-Sufficient API.
 

Abstract from the ICFP 2024 paper:

Many abstraction tools in functional programming rely heavily on general-purpose compiler optimization
to achieve adequate performance. For example, monadic binding is a higher-order function which yields
runtime closures in the absence of sufficient compile-time inlining and beta-reductions, thereby significantly
degrading performance. In current systems such as the Glasgow Haskell Compiler, there is no strong guarantee
that general-purpose optimization can eliminate abstraction overheads, and users only have indirect and
fragile control over code generation through inlining directives and compiler options. We propose a two-stage
language to simultaneously get strong guarantees about code generation and strong abstraction features. The
object language is a simply-typed first-order language which can be compiled without runtime closures. The
compile-time language is a dependent type theory. The two are integrated in a two-level type theory.
We demonstrate two applications of the system. First, we develop monads and monad transformers. Here,
abstraction overheads are eliminated by staging and we can reuse almost all definitions from the existing
Haskell ecosystem. Second, we develop pull-based stream fusion. Here we make essential use of dependent
types to give a concise definition of a concatMap operation with guaranteed fusion. We provide an Agda
implementation and a typed Template Haskell implementation of these developments.

The repo also contains demo implementations in Agda and Haskell), and older papers/implementations/videos.

 

F is written in K, another small language. In fact, the entire implementation is in one file: http://www.nsl.com/k/f/f.k.

Example program (defines factorial and computes fac(6), from http://www.nsl.com/k/f/f/fac.f):

[[1=][][dup!pred!fac!*]cond!]fac

The main site, "no stinking loops", is full of links including to other languages (scroll down) and resources on K. Although many are broken 🙁.

 

A blog post on the evolution of method syntax in Ruby.

The author (Victor Shepelev AKA zverok) lives in Ukraine and is part of the Armed Forces. See his other posts (about Ruby, the war, and more) here.

 

NumPy-style broadcasting = being able to write code like

[[1, 2, 3],
 [4, 5, 6],  + [10, 11, 12]
 [7, 8, 9]]

and have it evaluate the same as

              [[1, 2, 3],
map (map (+))  [4, 5, 6],  (rep 3 [10, 11, 12])
               [7, 8, 9]]

(which evaluates to [[11, 13, 15], [14, 16, 18], [18, 19, 21]]).

numpy docs. Numpy and most other languages do this at runtime, which is typically easier. But Futhark is statically-typed, so the type of the result must be known at compile time, and this means the broadcasting must also be done at compile time.

 

Adding Rust FFI into Vale, in particular the part where the Vale compiler determines the signature and proper overload of a Rust function.

Our ultimate goal is to write this Vale code:

import rust.std.vec.Vec;

exported func main() {
  vec = Vec<int>.with_capacity(42);
  println("Length: " + vec.capacity());
}
Length: 42

...which would successfully make a Rust Vec and call capacity on it.

Previous

[–] [email protected] 3 points 1 month ago

Author's comment on lobste.rs:

Yes it’s embeddable. There’s a C ABI compatible API similar to what lua provides.

[–] [email protected] 28 points 1 month ago* (last edited 1 month ago)

C++’s mascot is an obese sick rat with a missing foot*, because it has 1000+ line compiler errors (the stress makes you overeat and damages your immune system) and footguns.

EDIT: Source (I didn't make up the C++ part)

[–] [email protected] 7 points 1 month ago* (last edited 1 month ago) (2 children)

I could understand method = associated function whose first parameter is named self, so it can be called like self.foo(…). This would mean functions like Vec::new aren’t methods. But the author’s requirement also excludes functions that take generic arguments like Extend::extend.

However, even the above definition gives old terminology new meaning. In traditionally OOP languages, all functions in a class are considered methods, those only callable from an instance are “instance methods”, while the others are “static methods”. So translating OOP terminology into Rust, all associated functions are still considered methods, and those with/without method call syntax are instance/static methods.

Unfortunately I think that some people misuse “method” to only refer to “instance method”, even in the OOP languages, so to be 100% unambiguous the terms have to be:

  • Associated function: function in an impl block.
  • Static method: associated function whose first argument isn’t self (even if it takes Self under a different name, like Box::leak).
  • Instance method: associated function whose first argument is self, so it can be called like self.foo(…).
  • Object-safe method: a method callable from a trait object.
[–] [email protected] 9 points 2 months ago* (last edited 2 months ago)

Java the language, in human form.

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago) (1 children)

I find writing the parser by hand (recursive descent) to be easiest. Sometimes I use a lexer generator, or if there isn’t a good one (idk for Scheme), write the lexer by hand as well. Define a few helper functions and macros to remove most of the boilerplate (you really benefit from Scheme here), and you almost end up writing the rules out directly.

Yes, you need to manually implement choice and figure out what/when to lookahead. Yes, the final parser won’t be as readable as a BNF specification. But I find writing a hand-rolled parser generator for most grammars, even with the manual choice and lookahead, is surprisingly easy and fast.

The problem with parser generators is that, when they work they work well, but when they don’t work (fail to generate, the generated parser tries to parse the wrong node, the generated parser is very inefficient) it can be really hard to figure out why. A hand-rolled parser is much easier to debug, so when your grammar inevitably has problems, it ends up taking less time in total to go from spec to working hand-rolled vs. spec to working parser-generator-generated.

The hand-rolled rules may look something like (with user-defined macros and functions define-parse, parse, peek, next, and some simple rules like con-id and name-id as individual tokens):

; pattern			::= [ con-id ] factor "begin" expr-list "end"
(define-parse pattern
  (mk-pattern
    (if (con-id? (peek)) (next))
    (parse factor)
    (do (parse “begin”) (parse expr-list) (parse “end”))))

; factor 			::= name-id 
; 			 | symbol-literal
; 			 | long-name-id 
; 			 | numeric-literal 
;	 		 | text-literal 
; 			 | list-literal 
; 			 | function-lambda 
; 			 | tacit-arg
; 			 | '(' expr ')' 
(define-parse factor
  (case (peek)
    [name-id? (if (= “.” (peek2)) (mk-long-name-id …) (next))]
    [literal? (next)]
    …))

Since you’re using Scheme, you can almost certainly optimize the above to reduce even more boilerplate.

Regarding LLMs: if you start to write the parser with the EBNF comments above each rule like above, you can paste the EBNF in and Copilot will generate rules for you. Alternatively, you can feed a couple EBNF/code examples to ChatGPT and ask it to generate more.

In both cases the AI will probably make mistakes on tricky cases, but that’s practically inevitable. An LLM implemented in an error-proof code synthesis engine would be a breakthrough; and while there are techniques like fine-tuning, I suspect they wouldn’t improve the accuracy much, and certainly would amount to more effort in total (in fact most LLM “applications” are just a custom prompt on plain ChatGPT or another baseline model).

[–] [email protected] 4 points 2 months ago* (last edited 2 months ago)

My general take:

A codebase with scalable architecture is one that stays malleable even when it grows large and the people working on it change. At least relative to a codebase without scalable architecture, which devolves into "spaghetti code", where nobody knows what the code does or where the behaviors are implemented, and small changes break seemingly-unrelated things.

Programming language isn't the sole determinant of a codebase's scalability, especially if the language has all the general-purpose features one would expect nowadays (e.g. Java, C++, Haskell, Rust, TypeScript). But it's a major contributor. A "non-scalable" language makes spaghetti design decisions too easy and scalable design decisions overly convoluted and verbose. A scalable language does the opposite, nudging developers towards building scalable software automatically, at least relative to a "non-scalable" language and when the software already has a scalable foundation.

[–] [email protected] 47 points 2 months ago* (last edited 2 months ago) (6 children)
public class AbstractBeanVisitorStrategyFactoryBuilderIteratorAdapterProviderObserverGeneratorDecorator {
    // boilerplate goes here
}
[–] [email protected] 2 points 3 months ago

“You're going into Orbit, you stupid mutt.”

[–] [email protected] 8 points 3 months ago* (last edited 3 months ago) (1 children)

I believe the answer is yes, except that we’re talking about languages with currying, and those can’t represent a zero argument function without the “computation” kind (remember: all functions are Arg -> Ret, and a multi-argument function is just Arg1 -> (Arg2 -> Ret)). In the linked article, all functions are in fact “computations” (the two variants of CompType are Thunk ValType and Fun ValType CompType). The author also describes computations as “a way to add side-effects to values”, and the equivalent in an imperative language to “a value which produces side-effects when read” is either a zero-argument function (getXYZ()), or a “getter” which is just syntax sugar for a zero-argument function.

The other reason may be that it’s easier in an IR to represent computations as intrinsic types vs. zero-argument closures. Except if all functions are computations, then your “computation” type is already your closure type. So the difference is again only if you’re writing an IR for a language with currying: without CBPV you could just represent closures as things that take one argument, but CBPV permits zero-argument closures.

[–] [email protected] 6 points 3 months ago

Go as a backend language isn’t super unusual, there’s at least one other project (https://borgo-lang.github.io) which chosen it. And there are many languages which compile to JavaScript or C, but Go strikes a balance between being faster than JavaScript but having memory management vs. C.

I don’t think panics revealing the Go backend are much of an issue, because true “panics” that aren’t handled by the language itself are always bad. If you compile to LLVM, you must implement your own debug symbols to get nice-looking stack traces and line-by-line debugging like C and Rust, otherwise debugging is impossible and crashes show you raw assembly. Even in Java or JavaScript, core dumps are hard to debug, ugly, and leak internal details; the reason these languages have nice exceptions, is because they implement exceptions and detect errors on their own before they become “panics”, so that when a program crashes in java (like tries to dereference null) it doesn’t crash the JVM. Golang’s backtrace will probably be much nicer than the default of C or LLVM, and you may be able to implement a system like Java which catches most errors and gives your own stacktrace beforehand.

Elm’s kernel controversy is also something completely different. The problem with Elm is that the language maintainers explicitly prevented people from writing FFI to/from JavaScript except in the maintainers’ own packages, after allowing this feature for a while, so many old packages broke and were unfixable. And there were more issues: the language itself was very limited (meaning JS FFI was essential) and the maintainers’ responses were concerning (see “Why I’m leaving Elm”). Even Rust has features that are only accessible to the standard library and compiler (“nightly”), but they have a mechanism to let you use them if you really want, and none of them are essential like Elm-to-JS FFI, so most people don’t care. Basically, as long as you don’t become very popular and make a massively inconvenient, backwards-incompatible change for purely design reasons, you won’t have this issue: it’s not even “you have to implement Go FFI”, not even “if you do implement Go FFI, don’t restrict it to your own code”, it’s “don’t implement Go FFI and allow it everywhere, become very popular, then suddenly restrict it to your own code with no decent alternatives”.

view more: next ›