Tutorials
Practical T-SQL
Practical T-SQL
  • Practical T-SQL Pocket Guide For Beginners
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. Running The Examples
    • Section IV. How To Report An Issue
    • Section V. Join The MSU Community On Discord
    • Section VI. Supplementary Material
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments With T-SQL
    • Lesson 3. Basic Syntax
    • Lesson 4. Your First Query
    • Lesson 5. Filtering Data
    • Lesson 6. Sorting Data
    • Lesson 7. Complex Data Filtering
    • Lesson 8. Aliases
    • Lesson 9. String Functions
    • Lesson 10. Creating New Columns From Existing Data (Calculated Fields)
    • Lesson 11. Displaying Data Based On Conditions (Case Statement)
    • Lesson 12. Aggregate Functions
    • Lesson 13. Grouping And Summarizing Data
    • Lesson 14. Querying More Than One Table
    • Lesson 15. Combining Queries
    • Lesson 16. Subqueries
    • Lesson 17. Creating Data
    • Lesson 18. Updating Data
    • Lesson 19. Deleting Data
    • Lesson 20. Common Table Expressions (CTEs)
    • Lesson 21. Derived Tables
    • Lesson 22. Putting It All Together
  • Advanced Topics
    • Lesson 23. Selecting Unique Values
    • Lesson 24. Updating Data With A Join
    • Lesson 25. Data Types
    • Lesson 26. Casting Data Types
    • Lesson 27. Creating Tables
    • Lesson 28. Altering Tables
    • Lesson 29. Dropping Tables
    • Lesson 30. Variables
    • Lesson 31. Controlling Flow
    • Lesson 32. Looping
    • Lesson 33. Error Processing
    • Lesson 34. Temporary Tables
    • Lesson 35. Views
    • Lesson 36. Indexed Views
    • Lesson 37. User Defined Functions
    • Lesson 38. Stored Procedures
    • Lesson 39. BULK INSERT
    • Lesson 40. Loading Tables With MERGE
    • Lesson 41. Partitioning A Dataset
    • Lesson 42. Pivoting Data
    • Lesson 43. Dynamic SQL
    • Lesson 44. Cursors
  • Solutions To Real World Problems
    • Lesson 45. Listing All Tables In A SQL Server Database
    • Lesson 46. Listing All Columns In A SQL Server Database
    • Lesson 47. Pull Records From A Table At Random
    • Lesson 48. A Better Alternative To WITH (NOLOCK)
    • Lesson 49. Boost Performance When Calling A Stored Proc From SSIS
    • Lesson 50. Setting Up Queries For Ablation Testing
    • Lesson 51. Reduce Code And Save Time With Default Column Values
    • Lesson 52. Finding Duplicate Records In A Table
    • Lesson 53. Why You Cannot Have More Than One Clustered Index On A Table
    • Lesson 54. Converting Dates To YYYYMMDD
    • Lesson 55. Sending Notification Emails With T-SQL Without Using Hardcoded Email Addresses
    • Lesson 56. Troubleshooting Long Running Queries
    • Lesson 57. Loading Large CSVs Into Data Warehouse Staging Tables
    • Lesson 58. The Only Bloody Good Reason To Use Cursors
    • Lesson 59. Loading A Type II Slowly Changing Dimension With SQL Merge
    • Lesson 60. A Clearer Explanation Of The Parameters Of The Numeric Data Type
    • Lesson 61. Why You Cannot Join On Null Values
    • Lesson 62. A Deep Dive On How The Where Clause Functions
    • Lesson 63. Using HASHBYTES() To Compare Character Strings
    • Lesson 64. Using Pipe To Hash Multiple Columns For Matching
    • Lesson 65. Why People That Indent Code Drive Me Nuts
    • Lesson 66. How To Rapidly Stand Up A Data Warehouse From Scratch
    • Lesson 67. How To Pivot Data With T-SQL When Columns Are Not Predefined
    • Lesson 68. Prepopulating A Junk Dimension
Powered by GitBook
On this page
  1. Advanced Topics

Lesson 25. Data Types

SQL Server is strongly typed and by that, I mean you have A LOT of options when it comes to how you want to store data.

Data types define how information is stored on disk. This is a complex topic and I cannot do it justice in a tutorial. The most important part of understanding data types is that different data types take up different amounts of space in terms of 1s and 0s on disk for the same human readable data point. Put another way, the amount of space that the phrase “Eat my shorts man” takes up on disk will change depending on what data type you select for the column the phrase is stored in.

We are not going to talk about all the data types, just the ones that I think are the most important for you to understand. In general, your overall data type strategy should be defined by these two concepts:

  1. Store data for what it is.

  2. Minimize storage space.

Money

This is the first data type we’re going to talk about because it is the data type that defined my “store data for what it is” ethos. If you are dealing with monetary values, store those values as money.

Some people like to use one of the other decimal data types. Yeah. Don’t do that.

The money data type stores four decimal places which is in line with banking standards. You do not need more precision than that.

CHAR VS. VARCHAR

The difference between these two is CHAR stores a fixed character length. That means if the data you are storing is less than the defined character length, you will be wasting disk space. I use CHAR to store values like flags or codes which I know a priori do not deviate from length.

VARCHAR is more flexible. It flexes depending on the size of what is being stored. The final storage space will be the whatever bytes the value takes up plus two more bytes.

And on that topic, I have a secret to tell you. Both of these data types are defined by a parameter n. It is very easy to think that n represents the number of characters that can be stored. That is FALSE! The value n denotes BYTES! However, when I see a flag of some defined width say two characters, I’ll totally be like CHAR(2) and call it a day. And I’ve been sliding uphill doing that for twenty years.

VARCHAR VS. NVARCHAR

I will make this simple. Make your life easy and store string values in NVARCHAR. It will save you so many headaches. It has to do with ISO standards on encoding. At one point, I knew the deep technical reasons for the difference in the data types, but at the end of the day, 99% of my problems with string data go away when I use NVARCHAR.

When I need to store large amounts of text, I prefer to use NVARCHAR(MAX) over the text data type.

Date And Time

SQL Server has several data types for storing dates. I have a tendency to use datetime2 but will use others to save disk space when the scenario allows for it.

Numeric VS. Float

On the surface, these look like the same thing because they both store decimal values. They are not.

Numeric is the most precise way to store decimal information. Float is only an approximate number data type.

When I need to store decimal values that are not money, I do it with numeric. However, there are some edge cases where I have been forced to use float. These edge cases are really arcane. Almost every scenario has been unique and occur rarely. So rarely that I cannot even bring a specific scenario to mind.

Binary

This is the data type that I use when I want to store values that result from the HASHBYTES function. The output of HASHBYTES generates a character string of fixed width. The width depends on the hashing algo you use. HASHBYTES with binary is a very efficient way to store data.

Integer Data Types

SQL Server has numerous data types for storing integer values. It all depends on how big of an integer you need to store which means understanding the possible ranges of values.

At all times you should be trying to minimize storage space. There have been several instances where I have been comfortable using tinyint.

All of my data warehouse tables have BIGINT as the data type for primary keys. Warehouse tables can get really big and, one day, I actually blew up a primary key that was an int. The upper limit of int is 2,147,483,647. And I blew past that. Switched to BIGINT and have not had a problem since.

Unique Identifier

While I normally use auto incrementing BIGINTs as the primary keys in data warehouses, I will use uniqueidentifier for the primary key of staging tables.

Importing data into staging tables can be a messy process. Having a primary key that was just an auto incrementing value became problematic when the number of staging tables I was dealing with started to get large.

It was possible that several tables might be carrying the same values for primary keys at the same time which made troubleshooting loads difficult.

Switching to a data type that was a globally unique identifier made a lot of headaches go away. Now I could tell for certain which records originated from which tables. I made the switch to using a GUID a few years ago and never looked back.

PreviousLesson 24. Updating Data With A JoinNextLesson 26. Casting Data Types

Last updated 3 years ago