From the monthly archives:

November 2009

Using CHARINDEX and PATINDEX

How many times while you were writing a program, you wanted to find or search for a character or a string? The number should be quite high!!!

Talking with respect to SQL Server, how many times have we faced the situation of finding a character or a string within string or a character column?

In this article, let me take you through the basics of getting these things done with the help of Char Index and Pat Index functions.

 

CHARINDEX

Coming to char index, it can be best explained with its syntax. Its syntax looks like below, and it searches the expression2 for expression1 and returns the start position of the expression1 in expression2

The start location is an optional argument to the function which can be specified to as a hint to the function as from where to start the search from in the expression2

CHARINDEX ( expression1 ,expression2 [ , start_location ] )

For Example,

Lets take a string “abcde”, and lets find where “c” comes, we know the answer  - that it’s 3, we want to see how the CHARINDEX works(we are learning and not questioning Microsoft J )

And the sql statement for the same is

select charindex(‘c’,'abcde’)

And the result – 3

You can try it with a column in a table as well!!!

PATINDEX

Pat Index can be used as char index, and in a way it is bit more superior to char index as it would server more purposes. The PATINDEX function returns the starting position of a pattern of characters in a string or a text column of table. The main reason why it is more useful is that, PATINDEX supports wildcard characters in the search pattern string.

The syntax of this functions is

PATINDEX ( ‘%pattern%’ , expression )

Let’s take an example to see how it is used, let’s consider the below sql statement, we are searching for a star position of BC in the expression ABCDE, we need to get the answer as two, now lets see.

SELECT PATINDEX(‘%BC%’,'ABCDE’)

Result:

2

One more example, now, let’s reapeat that ABCDE twice and check,

SELECT PATINDEX(‘%BC%’,'ABCDEABCDE’)

Result:

2

The result is still two, so, PATINDEX gives the first position when we have more than one matching patterns.

And when we don’t have any it gives a result of zero, this can be tried!!!

So, in this article we have seen on the basic uses of CHARINDEX and PATINDEX, both are used for character and string searches but with second one we can use the wild card symbols which make it useful in a larger number of cases.

{ Comments on this entry are closed }

Incremental Loads using SQL Queries

Problem: Using SSIS we have very easy ways to load a SQL Server Data Mart Incrementally, but how to perform Incremental Data Load when we do not have SSIS with us, in other words using T-SQL Queries.

Solution: To begin with let us have a look at the architecture for loading the data mart, we can get our data from one or more data sources into a staging area. This step should purely depend upon the business and technical needs.
Whenever we want to load data incrementally, we should make sure that we are able to find a unique identifier to distinguish one record from the other. We need to take a look at the source timestamp, primary key, identity key, and unique key columns of the rows that we capture to ensure there is a continuous representation of each unique record in the dimension or a fact table.
 
Loading Fact Tables
In most cases we would be able to find a unique time stamp column or a unique key which is Bill No, Invoice No or a Incident No. Generally here we can go for Delete and Insert method between Stage and Mart Levels, for this we shall first identify the rate of Insertion and Updation of data in the source, and decide the number of days records that we need to get from the source to stage
 
Loading Dimension Tables
Loading dimension would be the trickiest part as we have to make sure all the changes are processed up to the data marts. Suppose, for example if we take a customer dimension, we can have a new customer being added, customers details being updated and in some rare cases customers information being deleted when there is no relevant fact data.
 
The above scenario can be tackled easily depending upon the amount of data by
Should import all the records or last “X” days records depending upon the need and
1.       Doing complete truncate and Insert. à If we want to have History information which is currently not available in transaction tables this method should not be used.
 
2.       Using Except Keyword:  If we have large number of duplicates and handle them on the fly we can go with the Except keyword, using it between the stage and mart, this would give us all the records that are not there in Mart and it would automatically remove duplicates.
 
Insert into Mart_Table
SELECT * FROM Stage_Table
 EXCEPT
SELECT * FROM Mart_Table;
 
 
3.       Doing a delete and Insert à we should delete the records that are available in the stage from mart and Insert all the records from stage to mart. Doing this we will also be having records that are not available in stage and this will also take care of any updating being data at the source level.
 
The above described were a few methods that I have used for loading data incrementally without using SSIS.
Conclusion: we can perform incremental loads using T-SQL when we do not have an access to SSIS.

{ Comments on this entry are closed }

There have always been many questions in the forums related to SQL Server, regarding the size of various objects in SQL Server and maintaining them, so, I have thought to put all the helpful information in the form of an article, and here it is.

 

How do I find the Size of a database?

 

You can always get it from the management studio, right click and then properties.

 

Suppose, if you want to do it through the query analyzer, to get the database size for all the databases then you can do it using

 

sp_databases

 

This system stored procedure would return you a list of databases and their sizes in KB, and the result for the same would be like the one given below,

 

DATABASE_NAME     DATABASE_SIZE           REMARKS

master                  4864              NULL

model                   1984              NULL

msdb                    19968             NULL

tempdb                  36360             NULL

 

If you want to get more details, like, the database id, compatibility level, created date and who is the owner in a single stroke you can make use of the system stored procedure

 

 

Sp_helpdb

 

 

the status column is not displayed as the result was too lengthy…

 

name  db_size     owner dbid  created status compatibility_level

master      4.75 MB     sa    1     Apr  8 2003       90

model       1.94 MB     sa    3     Apr  8 2003       100

msdb        19.50 MB    sa    4     Oct 14 2005       100

tempdb      47.82 MB    sa    2     May 27 2009       100

 

And there is one more system stored procedure which would be of great use for the system administrators who want to keep a watch on the used/unused, allocated/ unallocated and the size of indexes, it is

 

sp_spaceused

 

example reluts for the same are displayed below.

 

 

 

 

database_name     database_size     unallocated space

master            4.75 MB           1.04 MB

 

reserved    data  index_size  unused

3032 KB     1320 KB     1208 KB     504 KB

 

How do I find the size of Log file, and how much is used?

 

There is a standard DBCC Command particularly for this,

 

dbcc sqlperf(logspace)

 

Note: – The DBCC (Database Console Commands) Statements are provided in the transact SQL server programming language for checking physical and logical consistency and also for fixing most of the issues.

 

And the result of the same would be like, given below

 

Database Name     Log Size (MB)     Log Space Used (%)      Status

master            0.7421875         68.42105                0

tempdb            15.5              53.2006                 0

model             0.4921875         45.23809                0

msdb              0.7421875         63.15789                0

 

 

I want a list of all the files in a database with their sizes?

 

You can get the above information by running the below query in the desired database, this will return you a list of files, with their locations and size details.

 

Sp_helpfile

 

name  fileid      filename    filegroup   size  maxsize     growth      usage

master      1     C:\Program Files\DATA\master.mdf    PRIMARY     4096 KB     Unlimited   10%      data only

mastlog     2     C:\Program Files\DATA\mastlog.ldf   NULL  3136 KB     Unlimited   10%   log only

 

How do I find the Size of a table?

 

That’s bit simple; we need to just use the system stored procedure with a parameter of the table name for which we want to find the size sp_spaceused, Let me take an example here, suppose, there is a table in the msdb database with the name backupfile and I want to see the size statistics for that file, then, I would run the below code

 

use msdb

go

sp_spaceused backupfile

 

and the result for that would be

 

name        rows        reserved    data        index_size  unused

backupfile  314        328 KB      184 KB      16 KB       128 KB

 

 

How do I find free space in a drive on which I have my SQL Server running?

 

Most of the times, people cannot log into a server and they want to get the size details of the disk from query analyzer, for this there is an extended stored procedure provided directly, one thing that needs to be kept in mind is it gives the amount off free space left in the disk drive, and make sure that it never hits “ZERO”. And also it gives that numbers to us in MB…

 

Usage:-

 

 

Use Master

go

xp_fixeddrives

 

 

And the result for that would be some thing like below

 

drive MB free

C     8111

F     3154

 

 

Okay, that’s fine, till now we have see how to find the size of a database, size of a log file, size of a table and free space in the disk drives. Now let us take couple of issues that people face in general

 

Many people face these issues nearly on a daily/weekly basis…below are some answers for the same, there can be more methods off tackling the same issues

 

Temp DB is out of space, what should I do?

 

Depending on the server on which it is happening firstly, you have to tackle this by shrinking the log file, for this we have the DBCC SHRINKFILE Statement

 

For shrinking the log file, I can use a statement like this,

 

DBCC Shrinkfile(2,1) – where 2 is the file id and 1 is the amount of free space that is to be left.

 

If that doesn’t help you, then go and shrink the data files by using the statement

DBCC Shrinkfile(1,1)  — one by depending on the number of data files that you have.

 

Yes, even now if you are not able to fix it, then I think check the possibilities, and restart you SQL server services, that’s it issue resolved. But then, make sure that you place your Tempdb at a place where it has space to GROW!!!

 

 

My database Log file is full, what shall I do…..???

 

Same principle, first try and shrink the log file, most of the time it works out!!!

 

Worst case scenario; detach the database, by either go to all tasks or doing an

EXEC sp_detach_db ‘MyDB’

Then re attach the mdf file using

EXEC sp_attach_single_file_db @dbname = ‘MyDB’, @physname = N’C:\Program  Files\Microsoft SQL Server\ Data\MyDB_Data.mdf’;

 

 

If the database is not for a transactional system then we can keep the recovery mode of the database to Simple so that we don’t face these kind of issues as the log file is truncated ever time a COMMIT is done.

 

 

That’s it for now in this article, hope you have found some basic points, all at one place, and think it helps, please let me know what you have felt by voting this article.

{ Comments on this entry are closed }

BASICS OF WORKING WITH SQL SERVER CHECKSUM

This Article talks about what SQL Server CHECKSUM and  SQL Server BINARY CHECKSUM are and how they can be used effectively in SQL Server.

In simple terms, we can describe CHECKSUM as “a calculation performed on a sequence of characters or on a string to yield a single integer value, which will differ for every different sequence“. This would greatly help us in our day to day activities of comparing data in different systems or different set of tables, and also to retrieve data more quickly for a matched or unmatched set of records.

With respect to SQL Server, CHECKSUM would return integer value computed over a row in a table, or for a particular cell, or for an expression, or over a list of expressions. Or in other words CHECKSUM computes a hash value, called the checksum, over its list of arguments. Below is an example where we are taking a string as an argument for the checksum function.

For Example,

select CHECKSUM('abcdef') as ChecksumValue 

ChecksumValue

140941491

Example: 2

Create a temporary Table

Insert data for test

  CREATE TABLE #ChecksumTest( col1 INT, col2 VARCHAR(10))

INSERT INTO #ChecksumTest VALUES(1,'abc')

INSERT INTO #ChecksumTest VALUES (2,'bcd')

– Check the check sum value

 SELECT CHECKSUM(*) AS ChecksumValue FROM #ChecksumTest

ChecksumValue

34416

34481

We can make use of check sum for equality searches. We have to keep one thing in mind over here while using this; the check sum value that we get is purely dependent on the order of columns mentioned. The point which is also to be noted over here is, If we are comparing two columns or two expressions, the NULL values of the same data type are considered as same!!!

Now, taking an example for the NULL value comparison, let us take the above temp table and insert two records with NULL values and then see what would be the results, frankly, this makes life easier as we have a good standard method to compare NULL’s!!!
For Example

We can compare NULLS Also

insert into #ChecksumTest values(3,NULL)

select CHECKSUM(col2)as ChecksumValue from #ChecksumTest where col1 = 3

And the results for the above would be

Checksum Value

2147483647

 insert into #ChecksumTest values(4,NULL)

select CHECKSUM(col2)as ChecksumValue from #ChecksumTest where col1 = 4

And the results for the above would be

ChecksumValue

2147483647

Extreme Fast Indexes with CHECKSUM

One of the major uses of check sum is for building Hash Indexes. Actually, the checksum computes a hash value depending upon the parameters/arguments we pass (In the example 2, we passed “*” all columns). If we build columns in a table purely with these hash values and if we index these columns, they are called Hash Indexes.

Hash Indexes can be effectively used for equality searches, so for example, whenever we have strings with length of more than 50 and we have to equality search on those columns (generally for speeding up queries we build Indexes, but Indexes are not recommended for length of >50), then we can build these Hash columns and can perform search based on this column, as Indexes on Integer columns are more effective than Indexes on character/ string columns.

Now, let us try the above by trying it out with an example, continuing with the above temporary table created…

For Example

ALTER TABLE #ChecksumTest

ADD cs_col2 AS CHECKSUM(col2);

GO

CREATE INDEX NIX_Col2 ON #ChecksumTest (col2);

GO

BINARY CHECKSUM

It returns a binary checksum value computed over a row of a table or over a list of expressions. This is generally used to check if the data in a particular row of a table has changed.

For example, continuing with the above created temporary table, run the script below and observe the difference before updating the records and after updating the records

SELECT BINARY_CHECKSUM(*)as BinaryChecksum from #ChecksumTest

where col1=1;

GO

UPDATE #ChecksumTest set col2 = 'TEST'where col1 = '1';

GO

SELECT BINARY_CHECKSUM(*)as BinaryChecksum from #ChecksumTest

where col1=1;

GO

Binary Checksum

455504

Binary Checksum

5935073

CHECKSUM and BINARY_CHECKSUM are similar functions, but, return different values for the string and character data types. SQL Server settings can cause strings with different representation to compare equal.

Where we cannot Use Checksum

While using, one important point that has to be kept in mind would be that it is not compatible with text, ntext, image, cursor, xml. When used, the system may either ignore these columns or throws an error if these are the only columns given as arguments.

{ Comments on this entry are closed }