Cthulhu  0.2.10
Cthulhu compiler collection
Contributing

Source tree structure

  • data - various data files
    • docs - documentation
    • meson - cross files for various platforms
    • scripts - automation for arduous setup tasks
  • src - all source code
  • subprojects - dependencies
    • mini-gmp - fallback gmp library if system gmp isnt installed
  • tests - tests
    • lang - language specific tests
    • unit - compiler code unit tests

Where to put new code

When adding a new module consider how much of the compiler needs access to it. Treat levels of access as if they were permission levels, with Common library being the highest level. If a library will only ever be used by a user facing tool, it should be part of the Support libraries for languages, plugins, and frontends set of modules. When a module needs to be available to drivers it should be in Compiler runtime, the core compiler set. And if a module is going to be used extensively, and perhaps outside of the cthulhu project it should be placed in Common library.

Coding rules

  • Follow single source of truth
    • This is the single most important rule of the codebase
    • We want as little duplicated logic as possible
    • Follow it to absurdity
    • It is acceptable to make massive changes to move common functionality into libraries
    • I cannot stress how important this rule is
  • Be very generous with usage of Assertions and panic handling
    • Any function boundary or usage point should be littered with these
    • Do not use asserts to handle user input, asserts should be used for internal invariants
    • For user facing error reporting look at Compiler message notification
  • All features that require platform specific code must have a reasonable fallback implementation.
    • See Stacktrace library for an example of this, doing nothing can be a reasonable fallback.
  • Everything must be implemented in standard C11
    • No compiler extensions in common code.
  • No io or filesystem access that isnt marhshalled by IO stream interface or Filesystem abstraction
    • Makes porting to systems with non tradition IO easier.
  • The build process must only rely on C and meson
    • Optional features may require python/C++
    • To aid porting to systems that may not have a big ecosystem
    • We should never rely on an older version of cthulhu, the version - 1 problem is not a fun one.
  • All platform specific code must go in the Platform abstraction layer module
  • Forward declare all types where possible rather than including headers
    • This makes renaming types harder but reduces build times significantly
    • Also dont export dependencies unless absolutely needed
  • No internal versioning
    • Breaking source and ABI compatibility every commit is fine
    • Once plugins are implemented maybe i'll rethink this
  • If implementation details need to be leaked into headers suffix them with _impl and namespace them

Memory management strategy

Cthulhu aims to be usable as a library in embedded systems (read as: places without global malloc). As such we aim to only use user provided allocators, and to not use any global mechanisms for allocation.

Naming and style for allocating interfaces

In order of importance:

  1. When an object is heap allocated its constructor must be named <type>_new and need a arena_t as the last parameter.
    • map_new, vector_new, set_new etc
  2. When an object is stack allocated a <type>_init function should be provided when construction requires logic
typedef struct text_t
{
const char *string;
size_t length; // must be equal to `strlen(string)`
void text_init(text_t *text, const char *string)
{
size_t length = strlen(string);
text->string = string;
text->length = length;
}
a range of text
Definition: text.h:14
size_t length
the number of characters in the text
Definition: text.h:19
  1. If an object only requires memory then do not provide a delete function unless very necessary
    • Types that manage external resources must provide a delete function
  2. If a type could reasonably be allocated on either the stack or heap a <type>_init function should be provided
    • These should take a pointer to an uninitialized instance of the object and populate the required fields
    • Providing both _new and _make functions may also be good depending on how commonly the type is used
      • These functions should always be wrappers for the _init functions logic
typedef struct context_t
{
int first;
int second;
context_t *parent;
} context_t;
inline void context_init(context_t *ctx, context_t *parent)
{
ctx->first = context_get_field(parent, 0);
ctx->second = context_get_second(parent, 1);
ctx->parent = parent;
}

Portability considerations

Cthulhu also aims to be easy to port to more "exotic" systems, as such avoid relying on things that are not totally garunteed by the C specification.

  • For example prefer [u]int_fastNN_t or [u]int_leastNN_t over [u]intNN_t as both the former types are always present
  • Aim to do all floating point math via gmp or mpfr as they have consistent rounding rules
  • Prefer size_t and ptrdiff_t over fixed width types when managing sizes
    • The only exception to these rules is uintptr_t which is required to build the compiler, if or when its possible to remove this I will
  • Avoid using obscure syntax
    • This is more to aid in readability but some compilers dont support all syntax

Styleguide

  • all macros in headers should be prefixed with CT
  • all macros defined in generated files should be prefixed with CTU_
  • all macros used in c++ should be prefixed with CTX_
  • arena_t should always be the last argument to a function
    • the exception to this rule is variadic functions
// wrong
char *action_that_allocates(arena_t *arena, const char *config);
// correct
char *action_that_allocates(const char *config, arena_t *arena);
an allocator object
Definition: arena.h:86

Source files contents should be layed out in the following order

// includes
// macros
// typedefs
// forward declarations
// implementations
  • use const whenever its easy to do so
  • use #pragma once over include guards
  • use #define for constant unrelated values in headers
  • use static const for constant values in source files
  • use enum for defining related constant values

Banned features

  • no VLAs & alloca, hard to debug, causes crashes very easily
  • no volatile, it doesnt do what you think it does
  • no compiler specific extensions without ifdef guards
  • no mutable global state, all code must be reentrant and thread safe
    • exceptions have been made for the global panic and os error handlers
  • no inline asm
  • no thread local values
  • no non-const static locals

Flex/Bison styleguide

  • snake_case for rules
  • rules that match 1 or more of a rule should be named <rule>_seq
  • rules that match a rule with a seperator should be named <rule>_list
    // for example
    expr_list: expr
    | expr_list COMMA expr
    ;
    stmt_seq: stmt
    | stmt_seq stmt
    ;