Hey folks, glad to see a bunch of technical details!
At the same time, I'm very worried about creating a DSL, no matter if with an external parser or an in-house parser. It's easy to get
a DSL working, but all attempts that I've seen so far (even inside big companies) typically end up in a incomprehensible mess once you try to write a formal grammar/semantics for the language. Best case scenario, they end up being quirks of the language like {}+{} being NaN in JavaScript or
Code: Select all
auto [a, b] = std::minmax(2, 3);
int x = a + b;
being UB in C++. More typical scenario, lots of parsing/semantics hard-to-catch bugs creep in immediately like in
https://github.com/fusionlanguage/fut/issues/26 or
https://github.com/leaningtech/cheerp-meta/issues/89 or
https://github.com/llvm/llvm-project/issues/52790 or some that I cannot share due to NDA. Think "bugs in binary search" or "forgot to process spaces in the beginning/end", but x10 harder to notice unless you specifically look at it. Mostly happens because parsers/naive compiler tend to think about AST and syntax instead of semantics. Moreover, fixing those may require breaking backward compatibility.
Especially once you start adding optimizations. E.g. optimizing 0*x into 0 is incorrect if x has side effects. Does not happen yet I believe, but if you allow mods to provide functions in these expressions... Or x-x not always being zero even without side effects (
infinities, NaNs
).
If you care about the language being consistent, I believe it may be a good idea to do some fuzz testing at the very least: generate a random expression (not an easy feat on itself), add/remove spaces and random variable names, evaluate it with Python/Lua and compare with your engine. I've seen my own toy compiler working fine with random 50-chars expressions, but failing with 500-chars expressions that could later be simplified into 10-chars regression tests.
Would be happy to take a look at relevant pieces of code or, if you have time, have a chat (e.g. in Discord) about different algorithms, standard compiler gotchas, and how to avoid some corner cases.