Data Size – LEN() vs DATALENGTH()

How long in your data?
How long is your data?

How long is your data?

 

No. It isn’t just the way you use your data that counts, the data size counts too

Ok, let’s get talk a little about some SQL Server basics. Specifically a couple of useful T-SQL functions.

Let’s talk about determining the ‘size’ of your data in a field.

There are 2 functions in SQL Server to help with this.

LEN() & DATALENGTH()

You may think these are interchangeable, but be aware that they are not. The first, LEN(),  is used to measure the length of the data in a field.

DATALENGTH() is used to determine just how much space the data takes up in the database.

 

Len()

The easiest way to remember how LEN() works is with the following phrase:

‘When I, as a human, want to see the data, how many characters does this data take up?’

Now, there is a catch to eh LEN() function. It does count all the characters, but if you have empty space characters at the end of your string, they are not counted.

This can lead you to believe a field is storing just as you expect, but in actuality it could be padded out with extra spaces that LEN() isn’t catching.

This may be why you are not matching your query strings all the time.

[icon_box icon=”lightbulb-o” title=”What about using RTRIM() and LTRIM()”] LTRIM() and RTRIM() are 2 functions within SQL Server than can be used to trim leading (LTRIM()) and trailing (RTRIM()) empty space characters from a field. They only remove space characters though, so if there is some other strange character in the field that does not display, like a carriage return, you may still get incorrect matches. [/icon_box]

DATALENGTH()

DATALENGTH()’s phrase to remember it is a little different:

“How much space to I need to store this data in the database, even the parts I can’t see?”

The DATALENGTH() function looks at the data you have and returns the amount of space being used to store that piece of data.

There are some times with varchar, that this value will match up with the LEN() value returned, but that is not always the case.

DATALENGTH() does not ignore empty space, or padding, or special characters. It is a measure of truly how much space is being used.

It is a great way to see if there are hidden characters in a string you may not catch with LEN().

 

I have a very simple script here that you can use to play with these 2 functions and have fun checking out the results.

If you ran the above script (Check it yourself before you run any script you get online!) you would have seen some fun results. Things like how a NVarchar field takes up twice as much space and characters present. This is because storing Unicode values takes more space. You would also have seen that a numeric value stored in a a numeric datatype can also have the LEN() and DATALENGTH() functions applied to them.

You can play some more with the various data types and see how the result change.

You can get more detailed information about the data types that SQL Server supports here: https://msdn.microsoft.com/en-us/library/ms187752.aspx

 

[divider_top][icon_box icon=”users” title=”Who is Chris Bell?”]

Chris Bell, SQL Server MVP, MCSE & MCITP, is a 20 year veteran of using Microsoft products & SQL Server to create solutions for businesses, organizations and individuals. Chris speaks, blogs, writes articles and makes media of all kinds regarding SQL Server at WaterOxConsulting.com. Chris is also the founder of WaterOx Consulting, Inc. which features SQL Server consulting services along with the destination location week-long training series: SQL Summer Camp.

Click here to contact Chris or to follow him on twitter. [/icon_box]

Comments 8

  1. Pingback: Data Size – LEN() vs DATALENGTH() - SQL Server - SQL Server - Toad World

  2. Would add a statement to the article to the extent

    NEVER use len function in string calculations such as splitting up string into fields

  3. Post
    Author

    Never is a pretty strong word. What would be the justification / evidence to support NEVER using the LEN() function when splitting fields (not that I see a real reason to use it in the first place for that really)

  4. I read this article in the hope of finding some answers to my next challenge: dealing with UTF-8 characters, specifically, Chinese.

    select len(N’恭禧發財’),datalength(N’恭禧發財’)

    returns 4 and 8 respectively. Yet both only require nvarchar(4) to store them.

    Chinese characters are double-wide characters, but some unicode characters are triple-wide. So:-

    select len(N’

    1. … for some reason your web-form couldn’t handle the triple wide Unicode character, so it trimmed my comment. Anyway, the for a single triple-wide unicode character len() returns 2 and datalength() returns 4.

      I’d hoped you might have discussed this too.

      1. Post
        Author

        Andy – To me this seems more of a data type / collation topic than a string length function one. Regardless, it is an interesting point and worth discussing a little.

        The NVarchar data type in SQL Server is designed to store Unicode values, which include a majority of the characters used in all languages around the world, the problem is you sacrifice on space in your SQL Server when using it.
        The Char / Varchar data type allocates 1 byte per character when storing it in SQL Server. NChar and NVarchar characters require double the space of a regular Char/Varchar character or 2 bytes.
        Because of this, when using NVarchar, as you are to store the Unicode character, each character is allocated a double wide slot of space. In the case of the triple wide character, it is overflowing the single double wide slot and using 2 of them.
        This is why your single and double wide character look like they are using the same amount of space (they are), while the triple wide is using double.
        Hope that is the type of info you were looking for, if not contact me directly on my website and I’ll see if I can provide any more clarification or point you to a better resource.

        As a side not SQL Server really doesn’t support UTF-8, but does support UTF-16. I know that is odd, but I don’t know why that decision was made. Sorry.

  5. I’m dealing with a scenario where I’ve got values that appear as blank fields but have a datalength of 10. I’ve tried stripping and/or replacing all the usual suspects of hidden characters (tab, line break, eof etc) but they’re still there. I’m trying to find a solution without having to compensate for every possible ascii

    1. Post
      Author

      That’s tricky to deal with. I once had a string we couldn’t figure out and it was an HTML zero length field stored in as string that was causing all our issues.
      To figure out the values you could do something like parse out each character and check the ASCII value of it. Might give an idea as to what you are up against. Any that don’t fall within the normal range you think should be stored could be stripped out that way. Are you dealing with just spaces? ltrim(rtrim()) could help strip any blank leading or trailing spaces as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.