Managing Lookup Tables with Entity Framework Code First
Oftentimes in our applications we will have such things as a "lookup table". I am defining a lookup table as a list of relatively fixed or static choices such as status codes, states or provinces, and so on. In this post I will share how I manage lookup tables using Entity Framework.
Background: The Problem
Let's work with a lookup table of status codes. Often, in an application, there is some logic that is related to the status of some entity, such that doing the same action will produce different results depending on the entity's status. For example, in a banking system, the loan process might be different between regular or VIP customers. In a retail store, the checkout process might be different between members and non-members. In each case, a status check is part of the workflow, and the workflow branches depending on the result of the status check.
Now, suppose that statuses are stored in a lookup table in a database. How can the status be queried reliably and accurately?
Background: The Common Solution
Often, what I see done is this: in the database, an additional column is created called "code" or similar. The sole purpose of this column is to serve as an identifier in the application: the known "codes" are stored in the application in some data structure such as a dictionary or in a class of constants. The codes doesn't appear on any screen, and the data structure holding the codes are only used during checking.
For example, the database values would look like this:
Id Code Text
1 NON Non-Member
2 GLD Gold Member
3 SVR Silver Member
And then there would be a class of constants that look like this:
public class StatusCode
{
public const string NonMember = "NON";
public const string GoldMember = "GLD";
public const string SilverMember = "SVR";
}
Which would be used like this:
switch (entity.Status.Code)
{
case StatusCode.NonMember:
// logic for non-members
case StatusCode.GoldMember:
// logic for gold members
// .. other cases ..
}
This works, but there are a few downsides / points for improvements that I see:
- The
Status
member has to be brought in to enable a status check. See how we are drilling intoentity.Status
in the switch statement? That means an additionalJOIN
operation on the database. Just to check the status of an entity. - It's a magic string that doesn't really identify a particular row. Sure, we see a code like "GLD" and kinda infer that it is linked with the gold status. But the link stops there - there is no way in the code to enforce the link.
- It's not directly a domain term. The code's sole purpose is to serve as an identifier in the code (and as we saw above, it's not even that strong of an identifier). It doesn't appear on any screen, and users of the application don't use these codes. Which may be all fine if the code is a necessity, but, as we will see below, there is a better way.
So what's the better way?
Use the Primary Key as the Identifier
Yes, you read that right: just use the primary key as the identifier. Before I list the "why", let me post some sample code that uses this approach.
The database values would now look like this:
Id Text
1 Non-Member
2 Gold Member
3 Silver Member
The class of constants would now look like this:
public class StatusId
{
public const int NonMember = 1;
public const int GoldMember = 2;
public const int SilverMember = 3;
}
Which would then be used like this:
switch (entity.StatusId)
{
case StatusId.NonMember:
// logic for non-members
case StatusId.GoldMember:
// logic for gold members
// .. other cases ..
}
Now I can list the advantages that I see, which are answers to the disadvantages I listed above:
- The status can be checked directly on the entity, without bringing in the related
Status
object. We are now directly using theStatusId
in the entity rather than drilling down to theStatus
object. That is one lessJOIN
that we have to make. - We have the strongest possible identifier - it's the primary key. We really can't get a stronger identifier than that.
- We have removed our "clutch" column. Since we are using the primary key as identifier, we don't need an extra column anymore. The database will be less cluttered.
I know what you're thinking now:
Isn't Having a Strong Coupling with the Primary Key Directly a Bad Idea?
Back in the day, lookup tables are populated with SQL insert commands. If lookup tables have an identity primary key (and they usually do), there was no reliable way to determine beforehand what the generated primary keys would be - you would have to look at those after the insert script ran. Therefore, using codes made sense, as that was something known before the actual insert and is something that could be synchronized with the code.
But when using Entity Framework, particularly the Code First workflow with migrations, insert scripts are no longer necessary. The implementation of migrations also lend itself well to the approach I described of using the primary keys as identifiers.
Managing Migrations
A very useful method that can be used in migrations is the AddOrUpdate
method. The AddOrUpdate
method takes a list of entities as a parameter. For each entity in the list, it adds it to the database if it's not already there, and updates it otherwise. The method also takes in an expression as a parameter, and uses this expression to determine if an entity already exists or not.
How does this fit in with the lookup tables we are discussing?
Well, remember the constants class above, where we had all the ids? Here is how we can use that in migrations:
context.Statuses.AddOrUpdate(s => s.Id,
new Status { Id = StatusId.NonMember, Text = "Non-Member" },
new Status { Id = StatusId.GoldMember, Text = "Gold Member" },
new Status { Id = StatusId.SilverMember, Text = "Silver Member" });
Here we are seeding statuses and using the Id as the identifier expression. Notice that we are using the StatusId
class when constructing the objects - the same class that we use when checking the statuses. This gives us a strong guarantee that our identifiers really accurately identify whichever row they're supposed to be able to identify.
This is an example of embracing the Code First workflow - now we not only using code first for the schema but for the data as well.
In my opinion, this trumps the approach with the "code" columns I described above.
Conclusion
In this post I described a solution to lookup table management using Entity Framework Code First and migrations. The solution involves using primary keys directly instead of introducing an arbitrary column (such as a "code" column). This approach ties in nicely with the Code First migrations workflow and provides a very strong means of identification.