Thursday, April 17, 2008

How to delete records from a large table in small portions

At times, you need to delete records from a large table. Performing a delete operation on such a table can become quiet resource intensive. In order to increase performance, you can delete the rows in small portions. Let us see how using this query:


-- Create a temporary table CustomersTemp
SELECT * into Northwind.dbo.CustomersTemp from
Northwind.dbo.Customers

-- Procedure to delete in small groups
CREATE PROC spoc_deleteinsmallgroups
@NoOfRowsToDelete int
AS
DECLARE @CurrentRowCount int

-- Count the number of rows in the table
SELECT @CurrentRowCount = (SELECT COUNT(*) FROM Northwind.dbo.CustomersTemp)

-- Loop and delete records in small bits till the rowcount is 0
WHILE(@CurrentRowCount > 0)
BEGIN
DELETE TOP (@NoOfRowsToDelete) FROM Northwind.dbo.CustomersTemp
-- RECOUNT
SELECT @CurrentRowCount = (SELECT COUNT(*) FROM Northwind.dbo.CustomersTemp)
Print @CurrentRowCount
END

-- Execute the procedure and pass the number of rows
-- to delete at a time
EXEC spoc_deleteinsmallgroups 25

Find Hours, Minutes and Seconds in between two Datetime

In one of the previous articles, we explored how to find the difference between two dates.

In this short snippet, let us see how to find the Hours, Minutes and Seconds in between two Datetime.

-- Find Hours, Minutes and Seconds in between two datetime
DECLARE @First datetime
DECLARE @Second datetime
SET @First = '04/02/2008 05:23:22'
SET @Second = getdate()

SELECT DATEDIFF(day,@First,@Second)*24 as TotalHours,
DATEDIFF(day,@First,@Second)*24*60 as TotalMinutes,
DATEDIFF(day,@First,@Second)*24*60*60 as TotalSeconds

Extract Numbers from a String using Sql Server 2005

I have seen a common requirement where we have numbers and alphabets mixed up in a column (house no in the address field) and the number has to be extracted out of the string.

Here's a query to do so:

-- This query is a Modification of the Original query by 'usenetjb'

DECLARE @NumStr varchar(1000)
SET @NumStr = 'This string contains 5 words and now 9 words';

BEGIN
WHILE PATINDEX('%[^0-9]%',@NumStr)> 0
SET @NumStr = REPLACE(@NumStr,SUBSTRING(@NumStr,PATINDEX('%[^0-9]%',@NumStr),1),'')
END

PRINT @NumStr

Populate a table from a .CSV or .TXT file using SQL Server 2005

You can use Bulk Insert or SSIS to import text/csv files. There are some advantages and disadvantages using any of these methods. In this article, we will explore how to use the OPENROWSET to read a data file and populate a table.

Note: In SQL Server 2005, OPENROWSET can read from a data file without loading the data into a target table. This lets you use OPENROWSET with a simple SELECT statement.

Follow these steps:

Step 1: Create a database called 'Employees'. Create a table called '
EmployeeDetails' in it using the script given below:

USE [Employee]
GO
/****** Object: Table [dbo].[EmployeeDetails] Script Date: 04/11/2008 11:12:32 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[EmployeeDetails](
[EmployeeID] [nvarchar](50) NOT NULL,
[EmployeeName] [varchar](50) NULL,
[EmployeeAddress] [varchar](50) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF

Step 2: To run ad-hoc queries on the SQL server, you would first need to enable it using the following query:

sp_configure 'show advanced options',1
RECONFIGURE WITH override
GO
sp_configure 'Ad Hoc Distributed Queries',1
RECONFIGURE WITH override
GO

Step 3: Create a txt/csv file with the following format on your C:\. The file is called 'Employee.csv'

EmployeeID EmployeeName EmployeeAddress
1 Kat 23/Avenue. Park
2 Jim Jeoff Street
3 Tom Lifer Road

Step 4: The final step is to run the query and populate the EmployeeDetails table

USE Employee
GO
INSERT INTO EmployeeDetails(EmployeeID,EmployeeName,EmployeeAddress)
SELECT *
FROM
OPENROWSET('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};
DEFAULTDIR=C:\;Extensions=CSV;','SELECT * FROM Employee.csv')

References : http://msdn2.microsoft.com/en-us/library/ms190312.aspx

Find First and Last Day Of Each Month

The first day of each month is always 1. However if you interested to find the First and Last Date of each month using a Sql Query, here it is:

-- Return First and Last Date of a Month
DECLARE @Month smallint
SET @Month = 1

SELECT DAY(DATEADD(Month, DATEDIFF(Month, -1, getdate()) - 2, -1) + 1) as FirstDayofMonth,
DAY(DATEADD(Month, DATEDIFF(Month,0,getdate())+@Month, -1)) as LastDayOfMonth

If you wish to find out the Lastday of the previous month, just set @Month to 0.

DATENAME() function in SQL Server 2005

The DATENAME() is quiet a handy function if you want to return a literal form (string) of the part of the date specified.

The syntax goes like:
DATENAME ( datepart ,date )

Where :
datepart - specifies the part of the date to return
date - datetime or smalldatetime expression

USAGE:

DECLARE @Dt datetime
SET @Dt = '2008-04-15 8:34:54.713'

SELECT DATENAME(year, @Dt) as 'Year'
-- Returns 2008

SELECT DATENAME(quarter, @Dt) as 'Quarter'
-- Returns 2

SELECT DATENAME(month, @Dt) as 'Month'
-- Returns April

SELECT DATENAME(dayofyear, @Dt) AS 'Day of Year';
-- Returns 106

SELECT DATENAME(day, @Dt) AS 'Day';
-- Returns 15

SELECT DATENAME(week, @Dt) AS 'Week';
-- Returns 16

SELECT DATENAME(weekday, @Dt) AS 'Weekday';
-- Returns Tuesday

SELECT DATENAME(hour, @Dt) AS 'Hour';
-- Returns 8

SELECT DATENAME(minute, @Dt) AS 'Minutes';
-- Returns 34

SELECT DATENAME(second, @Dt) AS 'Seconds';
-- Returns 54

SELECT DATENAME(millisecond, @Dt) AS 'Milliseconds';
-- Returns 713

Resolving CREATE DATABASE Permission denied in database 'master' error on Vista and SQL Express

Have you encountered the error 'CREATE DATABASE Permission denied in database 'master'' even though you are logged into Windows Vista with administrator privileges.

Reason for the error : Windows Vista users that are members of the Windows Administrators group are not automatically granted permission to connect to SQL Server, and they are not automatically granted administrative privileges.

Resolution: Grant rights to the administrator. Follow these steps:

Step 1: Go to Start > All Programs > Microsoft SQL Server 2005 > Configuration Tools > SQL Server Surface Area Configuration.

Step 2: Click on 'Add New Administrator'.

Step 3: In the 'SQL Server User Provisioning on Vista' dialog box, look out for the 'Member of the SqlServer SysAdmin role ' in the 'Available Privileges' box. Transfer it to the 'Privileges that will be granted to' box. Click Ok.

Note: You will be able to see 'Add New Administrator' in the Configuration tool only if you have logged in as an administrator

Check if a user has access to a database in Sql Server 2005

Method 1:HAS_DBACCESS returns information about whether the user has access to the specified database (BOL).

Example:

SELECT HAS_DBACCESS('Northwind');

returns

1 if the user has access to the database
0 if the user does not have access to the database
NULL if the database does not exist

Find all databases that the current user has access to

SELECT [Name] as DatabaseName from master.dbo.sysdatabases
WHERE ISNULL(HAS_DBACCESS ([Name]),0)=1
ORDER BY [Name]


Method 2:Other best alternative of the above command is you can use sp_helplogins without paramter for checking the existing user permissions and the database list


ex
1. sp_helplogins
2. sp_helplogins 'LoginName'

How to remove duplicate rows from a table in SQL Server


Microsoft SQL Server tables should never contain duplicate rows, nor non-unique primary keys. For brevity, we will sometimes refer to primary keys as "key" or "PK" in this article, but this will always denote "primary key." Duplicate PKs are a violation of entity integrity, and should be disallowed in a relational system. SQL Server has various mechanisms for enforcing entity integrity, including indexes, UNIQUE constraints, PRIMARY KEY constraints, and triggers.

Despite this, under unusual circumstances duplicate primary keys may occur, and if so they must be eliminated. One way they can occur is if duplicate PKs exist in non-relational data outside SQL Server, and the data is imported while PK uniqueness is not being enforced. Another way they can occur is through a database design error, such as not enforcing entity integrity on each table.

Often duplicate PKs are noticed when you attempt to create a unique index, which will abort if duplicate keys are found. This message is:
Msg 1505, Level 16, State 1 Create unique index aborted on duplicate key.
If you are using SQL Server 2000 or SQL Server 2005, you may receive the following error message:
Msg 1505, Level 16, State 1 CREATE UNIQUE INDEX terminated because a duplicate key was found for object name '%.*ls' and index name '%.*ls'. The duplicate key value is %ls.
This article discusses how to locate and remove duplicate primary keys from a table. However, you should closely examine the process which allowed the duplicates to happen in order to prevent a recurrence.

MORE INFORMATION

For this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.
create table t1(col1 int, col2 int, col3 char(50))
insert into t1 values (1, 1, 'data value one')
insert into t1 values (1, 1, 'data value one')
insert into t1 values (1, 2, 'data value two')
The first step is to identify which rows have duplicate primary key values:
SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
This will return one row for each set of duplicate PK values in the table. The last column in this result is the number of duplicates for the particular PK value.

col1col2
112


If there are only a few sets of duplicate PK values, the best procedure is to delete these manually on an individual basis. For example:
set rowcount 1
delete from t1
where col1=1 and col2=1
The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.

Before deleting the rows, you should verify that the entire row is duplicate. While unlikely, it is possible that the PK values are duplicate, yet the row as a whole is not. An example of this would be a table with Social Security Number as the primary key, and having two different people (or rows) with the same number, each having unique attributes. In such a case whatever malfunction caused the duplicate key may have also caused valid unique data to be placed in the row. This data should copied out and preserved for study and possible reconciliation prior to deleting the data.

If there are many distinct sets of duplicate PK values in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:
1.First, run the above GROUP BY query to determine how many sets of duplicate PK values exist, and the count of duplicates for each set.
2.Select the duplicate key values into a holding table. For example:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
3.Select the duplicate rows into a holding table, eliminating duplicates in the process. For example:
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
4.At this point, the holddups table should have unique PKs, however, this will not be the case if t1 had duplicate PKs, yet unique rows (as in the SSN example above). Verify that each key in holddups is unique, and that you do not have duplicate keys, yet unique rows. If so, you must stop here and reconcile which of the rows you wish to keep for a given duplicate key value. For example, the query:
SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2
should return a count of 1 for each row. If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table.
5.Delete the duplicate rows from the original table. For example:
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
6.Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
Top 10 Largest Databases in the World

1. World Data Centre for Climate

220 terabytes of web data 6 petabytes of additional data

2. National Energy Research Scientific Computing Center

2.8 petabytes of data Operated by 2,000 computational scientists

3. AT&T

323 terabytes of information 1.9 trillion phone call records

4. Google

91 million searches per day accounts for 50% of all internet searches Virtual profiles of countless number of users

5. Sprint

2.85 trillion database rows. 365 million call detail records processed per day At peak, 70,000 call detail record insertions per second

6. ChoicePoint

250 terabytes of personal data Information on 250 million people

7. YouTube

100 million videos watched per day 65,000 videos added each day 60% of all videos watched online At least 45 terabytes of videos

8. Amazon

59 million active customers More than 42 terabytes of data

9. Central Intelligence Agency

100 FOIA items added each month Comprehensive statistics on more than 250 countries and entities Unknown number of classified information

10. Library of Congress

130 million items (books, photographs, maps, etc) 29 million books 10,000 new items added each day 530 miles of shelves 5 million digital documents 20 terabytes of text data