Dustin Horne

Developing for fun...

Views and Incorrect Data in Entity Framework

Today I want to touch on a topic that I have seen pop up many times.  There are many stack overflow answers and blog posts that cover it, but I wanted to combine several tidbits into one.  The problem is that views as entity models in Entity Framework sometimes return incorrect data.  The views themselves will return correct data, but the models will not be populated correctly.

If you've paid attention to your Error List window and are not ignoring warnings (as you shouldn't), then you've likely already seen the answer to this problem, though you may not have realized it or known how to fix it.  The issue is that Views are not going to have a primary key defined as tables do.  Entity Framework will warn you that the entity contains no primary key and it will do its best to infer one for you but can't always do so.  In fact, it will generally just infer the combination of all columns in your view as an entity key, and if you legitimately have data that's duplicated, things can get messy.

You've likely seen warnings like this one:  
Warning Error 6002: The table/view 'Demo.dbo.VwPublications' does not have a primary key defined. The key has been inferred and the definition was created as a read-only table/view.

Quite possibly your views are working just fine, however it's a good idea to have primary keys defined when using with Entity Framework.  If you have a column that's always unique, this is easy to do.  Right click each of your existing columns and uncheck "Entity Key" in the EDMX designer.  Then, right click the unique column and check "Entity Key".  That column will be used to identify the unique records and bind your data.

Sometimes, however, you may not have a column with a unique value.  As an example, I've created a database with three tables:  Publishers, Books and Magazines.  The Publishers table has a PublisherId and PublisherName.  The Book table has BookId, PublisherId and Title.  The Magazine table has MagazineId, PublisherId and Title.  This allows me to track all of my books and magazines and associate them with their publisher.

Sometimes I may want to see the entire list of publications with their publishers.  I've created a view that joins publishers and books and unions a query that joins publishers and magazines.  It adds an additional column called "PublicationType" so I can see what type of publication it is.  This creates a conundrum for me, however.  In my data I have a magazine that was formerly published by Publisher A but is now published by Publisher B.  Furthermore, Publisher A has two different magazines with the same title.  Now there is no way to guarantee uniqueness.

The SQL for my View looks like this:

CREATE VIEW [dbo].[VwPublications] AS

SELECT Publishers.PublisherName, 
	Books.Title,
	'Book' as PublicationType
FROM Books
	INNER JOIN Publishers ON Books.PublisherId = Publishers.PublisherId
UNION ALL
SELECT Publishers.PublisherName, 
	Magazines.Title,
	'Magazine' as PublicationType
FROM Magazines
	INNER JOIN Publishers ON Magazines.PublisherId = Publishers.PublisherId

And my result set looks like this:

In this particular case, EF generates the View in the EDMX and sets all columns as an entity key.  Querying the entire list of publications does return the correct results but that may not always be the case.  To replicate what happens, we remove EntityKey from the Title and PublicationType columns and just leave it on PublisherName.  When we then query the result via EF, we well get the correct number of records, but we will see data missing and data duplicated.

To fix this problem, we need to have a unique value in our view.  To do this, we will modify our view to have a Row_Number() generated column.  There are two issues with this.  First of all, Row_Number() Over expects an "Order By" clause.  This may not be desirable as we may not want the overhead of ordering our results and we want to order them ourselves in code.  A quick and easy way around this is to use Order By (SELECT 1) instead of a column name.  This will prevent the view from being ordered.  You can use any constant value you'd like (even Null) in place of 1.

The second issue we have is that our query has a union, which means that adding Row_Number() Over would need to be done in both queries and it will start over for the second query, resulting in 1, 2, 3, 4 for Books and 1, 2, 3, 4, 5, 6, 7 for Magazines.  Now we no longer have a unique value.  To remedy this, we will treat our View as a subquery and select from its result, adding the Row_Number to the outer query.

Our updated SQL now looks like this:

CREATE VIEW [dbo].[VwPublicationsWithIndex] AS

SELECT Row_Number() OVER(ORDER BY (SELECT 1)) AS RowNumber, * FROM
(
	SELECT Publishers.PublisherName, 
		Books.Title,
		'Book' as PublicationType
	FROM Books
		INNER JOIN Publishers ON Books.PublisherId = Publishers.PublisherId
	UNION ALL
	SELECT Publishers.PublisherName, 
		Magazines.Title,
		'Magazine' as PublicationType
	FROM Magazines
		INNER JOIN Publishers ON Magazines.PublisherId = Publishers.PublisherId
) AS NumberedResult

Now in our EDMX designer, we remove the Entity Key property from all of the columns except for RowNumber.  This allows Entity Framework to have a guaranteed unique column to tie the data back to your entities and your results will be correct.  A comparison of the two views in the designer looks like this:

As you can see, the RowNumber column has now been set as the Primary Key via "Entity Key".  There are a couple of additional things to keep in mind.  Annoyingly, any time you update your EDMX, the warning will still be displayed, even though you've specified the key.  Also, any time you're updating these particular views, you may have to manual re-apply the entity keys.

And finally, this problem serves as an important lesson.  While Warnings seem harmless, you should never just ignore them as they may cause big headaches down the road.