TypeDB Blog
A new dimension of database design

How TypeDB brings semantic validation, safer queries, and smarter optimisation through type theory
TL;DR: Traditional databases let bad queries through. TypeDB uses a rich type system to validate structure, enforce semantics, and optimise queries. It’s like having a compiler for your data which is especially powerful in complex domains like finance, cybersecurity, and robotics.
In early June, I gave a talk in New York (video here) exploring a question that sparked a lot of conversation:
What if your database could validate and optimise queries the way a type-safe programming language does?
In this blog, I’ll dive into the details behind that talk. The goal is to break down the reasoning behind our approach at TypeDB, the theory that powers it, and why we believe structured types (not ad hoc constraints!) are the future of data systems.
The problem with SQL’s type system
SQL is generally considered strongly typed, though that is limited to the constructs it supports. It lacks the expressiveness to fully encode your intent in its type system. That means more responsibility falls on you, the developer, to avoid errors manually. SQL’s lower-level, simpler types become dangerous when they hide silent assumptions and brittle joins.
Here are just a few things SQL lets you do accidentally:
— Incorrectly join on fields if the value types match
SELECT * FROM users JOIN complaints ON users.email = complaints.phone;
— Forget to add foreign key constraints
CREATE TABLE users (id INT PRIMARY KEY);
CREATE TABLE orders (id INT PRIMARY KEY, user_id INT);
— Compare against nulls with surprising results
SELECT * FROM orders WHERE completed_date > '2020-01-01';
These are more than simple bugs; They’re symptoms of a query language that doesn’t understand what your data means – it’s not expressive enough to do so!
In large systems, these subtle mismatches become expensive. Type mismatches, ambiguous relationships, and implicit constraints can undermine reliability and lead to dangerous edge cases, especially as we try to capture more complex domains.
The evolution of database systems
(from https://www.analyticsvidhya.com/blog/2014/11/types-databases-evolution/)
It’s useful to look at history and see in which direction databases are headed. Database development has historically focused on abstraction of hardware and algorithms, and improving access to specialised use cases.
Edgar Codd’s wrote in his famous paper that data independence over the underlying storage layer was a fundamental goal of relational algebra. It additionally created the enduring idea of a declarative view over data, and meant the user could just describe what they wanted done, and not how to do it.
However, since relational algebra, databases have been dealing with scale and enabling specific usage patterns:
- Key-value, column-oriented databases, document databases arose around the internet era to handle massive volumes of data.
- Others, like graph databases or timeseries databases, arose to address specialised data patterns more effectively.
These systems sacrifice some of the properties of relational databases, such as the declarative nature, lack of redundancy, and, critically, strictly enforced schemas.
Programming language land
If we look elsewhere for principled ways to address the issues in SQL, we quickly encounter programming languages’ rich breadth of type systems.
The world of programming languages offers a vast array of choices tailored to specific problems. For performance-sensitive, safety-critical applications, a weakly-typed system that permits preventable runtime errors would be an unsuitable choice. Instead, you might opt for something like Scala or Rust, which offer strong guarantees for safety without requiring extensive manual testing.
The key is choice. In the database world, we lack this choice. You don’t have the option to invest more upfront in a safety-critical system to gain stronger guarantees, greater expressivity for complex problems, and simplified maintenance.
Introducing an expressive type system for data
TypeDB introduces a more expressive type system to databases, grounded in dependent type theory. Instead of just declaring tables and columns, you model your domain using structured types and roles:
- Entity types for concepts like users, assets, or transactions
- Relation types for logical, enforceable connections (e.g., authorship, trade), along with related Role types that are the interfaces between other types
- Attribute types for concrete, typed values (e.g., date, currency, string)
Concretely, these represent first-class structure that the database understands and can leverage! At the same time, these structures directly resolve the issues we first noted in SQL –
- Explicit relationship types instead of join tables being “just another table”
- Role types, first class interface types, instead of second-class foreign key constraints
- Guaranteed vs optional values declared using built-in cardinality constraints
Because these constraints are part of the schema rather than application logic, they’re enforced consistently and validated automatically, and the database can leverage them for checking and optimizing queries.
The database schema as a type system
Entities, relations, attributes and roles comprise the building blocks of TypeDB’s type system. In programming language land, this is similar to having the built-in Object type that all classes subtype from implicitly – TypeDB just has 4 such constructs!
Ultimately, you will build your model of the data using these building blocks. For example, we could imagine building a very simple database representing employees and teams:
define
relation team-membership,
relates member,
relates team;
entity employee,
owns name @card(1),
plays team-membership:member;
entity team,
owns name @card(0..1),
plays team-membership:team;
attribute name, value string;
What you end up with is a simple, readable type system that simultaneously operates as the database schema! And not only is it human-readable, but because it is defined using entities, relations, attributes and roles, the database can understand it too. In short, we’ve now got a common language that the user and the database can speak to manage data.
Queries that understand structure
Let’s try to use the types in the schema to look up data:
match $employee isa employee; team-membership (team: $employee);
In TypeDB, this query will fail with an error!
How does TypeDB know that this query is semantically invalid?
TypeDB’s type inference engine compares the information in the query against the schema. If a variable is used in a way that violates role constraints or inheritance, the query fails before execution.
In this case, it can determine that $employee must be an employee, but instances of employee types cannot play the role team-membership:team in the user-defined schema!
This idea is extremely powerful. The database can now:
- Catch silly mistakes or typos, much like a strongly typed programing language does
- Eliminate most classes of runtime errors (division by zero is still possible!!)
- Trust that your queries reflect your data model
This is especially valuable in large domains like finance or logistics, where complex relationships and constraints must be trusted to behave as expected.
Type inference for smarter optimisation
Once you have a type system, you gain more than just safety, you gain structure for optimisation.
Because the engine knows exactly what roles are allowed, how relations compose, and which values are guaranteed or optional, it can:
- Prune irrelevant search paths during execution
- Avoid redundant traversals and joins
- Use semantic structure to guide statistical query planning
How does this work? Let’s take a simple example against the schema above:
match team-membership (member: $member);
It’s all about type inference again: the query compiler recognises that $member must be a type that plays the role team-membership:member, which can only be the employee entity type in our example!
Fascinatingly, the larger your query, and the more types are involved, the more powerful this can become. One seemingly unrelated part of the query can constrain the permitted types for each variable transitively through the entire query and unlock performance – without the user doing anything!
Your schema, as an engine
In TypeDB, the schema isn’t passive documentation. It’s a strong and expressive type system that serves as an engine of validation, inference, and optimization, giving you:
- Built-in query validation that prevents misuse
- Safer, composable queries
- Elimination of NULLs and fragile joins
- Domain-aligned modelling that mirrors real-world concepts
- Higher-order relationships with defined semantics
- Familiarity due to proximity to ideas in programming languages
It’s the difference between a text file of suggestions and a compiler that protects your intent.
If you work with complex data models like those in finance, cybersecurity, or robotics, TypeDB helps you spend less time fighting the shape of your data and more time building with it.
Why this matters commercially
TypeDB offers a new direction for databases: it’s a new choice users to select, especially when building complex models or where correctness is paramount. Structured modelling isn’t only an academic exercise. It brings real benefits in high-stakes systems:
TypeDB is already being adopted in robotics, financial systems, and knowledge-intensive AI use cases, domains where reasoning, clarity, and safety are critical.
For example, see how we’re helping advance robotic reasoning in our recent robotics blog. These are systems where schema-level logic makes the difference between brittle graphs and robust models.
What’s the catch?
More structure comes at a cost: learning curve, and disk space.
Firstly, there’s a slight learning curve when adopting a new modeling and schema language. That’s a challenge any new paradigm or system presents. However, we are confident that by learning TypeQL, the time required to design and implement database schemas, coupled with their long-term safety guarantees and refactorability, make this a worthwhile effort.
And secondly, because TypeQL and TypeDB lend themselves so well to ad-hoc queries, the system must index most of the data that can be meaningfully queried – which is easier given the inherent structure in the TypeDB’s schema! These automatically managed indices result in a disk storage approximately 30% over traditional databases.
Of course, for the value increase from TypeDB’s capabilities, we believe it’s a very strong trade-off – the most expensive item is developer time, followed by CPU and then memory costs, with disk space at the very end.
Try it yourself
TypeDB Studio is now live in your browser at studio.typedb.com. No installation needed. Just start modelling immediately!
Want a walkthrough? Read the Studio quickstart and build your first schema.
You can also watch Josh’s full talk on YouTube or join the discussion on Discord.
TypeDB: Smarter data systems start with stronger types.
Further Learning
Dive in and explore TypeDB or learn more about the technology