Here’s a comprehensive solution addressing data cleaning, import script optimization, and proper unique constraint handling:
1. Data Cleaning Strategy:
Before importing, implement a pre-validation step that identifies potential duplicates and data quality issues:
-- Pseudocode - Data cleaning validation:
1. Load external data into staging table QualityTestResultStaging
2. Run deduplication query on staging data using QualityTestResultId
3. Identify records with NULL or invalid foreign keys (ItemId, TestGroupId)
4. Flag records with conflicting timestamps for manual review
5. Generate data quality report before proceeding with import
-- See: Quality Management Data Import Guide
Create a staging table with audit columns:
CREATE TABLE QualityTestResultStaging (
StagingId BIGINT IDENTITY PRIMARY KEY,
QualityTestResultId NVARCHAR(50),
TestDate DATETIME,
TestResult NVARCHAR(20),
ItemId NVARCHAR(50),
ProcessingStatus NVARCHAR(20) DEFAULT 'Pending',
ErrorMessage NVARCHAR(500),
SourceModifiedDateTime DATETIME,
ImportDateTime DATETIME DEFAULT GETDATE()
);
2. Import Script with Proper Duplicate Handling:
Use a MERGE statement that handles all scenarios - insert new records, update existing ones, and log conflicts:
-- Pseudocode - Main import logic:
1. BEGIN TRANSACTION with SERIALIZABLE isolation level
2. MERGE QualityTestResult target using QualityTestResultStaging source
3. ON target.QualityTestResultId = source.QualityTestResultId
4. WHEN MATCHED and source newer: UPDATE target fields
5. WHEN NOT MATCHED: INSERT new record into target
6. Log all actions to QualityImportLog table
7. COMMIT TRANSACTION
-- Reference: SQL Server MERGE documentation
The actual MERGE implementation:
MERGE QualityTestResult AS target
USING QualityTestResultStaging AS source
ON target.QualityTestResultId = source.QualityTestResultId
WHEN MATCHED AND source.SourceModifiedDateTime > target.ModifiedDateTime
THEN UPDATE SET
TestResult = source.TestResult,
TestDate = source.TestDate,
ModifiedDateTime = GETDATE()
WHEN NOT MATCHED BY TARGET
THEN INSERT (QualityTestResultId, TestResult, TestDate, ItemId, CreatedDateTime)
VALUES (source.QualityTestResultId, source.TestResult, source.TestDate, source.ItemId, GETDATE());
3. Unique Constraint Management:
Ensure your constraints are properly defined and add a unique index on the business key if it doesn’t exist:
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE name = 'UX_QualityTestResult_BusinessKey')
CREATE UNIQUE NONCLUSTERED INDEX UX_QualityTestResult_BusinessKey
ON QualityTestResult(QualityTestResultId, ItemId);
Implement a constraint check procedure:
CREATE PROCEDURE sp_ValidateQualityImportConstraints
AS
BEGIN
UPDATE QualityTestResultStaging
SET ProcessingStatus = 'Invalid',
ErrorMessage = 'Foreign key violation: ItemId not found'
WHERE ItemId NOT IN (SELECT ItemId FROM InventTable);
UPDATE QualityTestResultStaging
SET ProcessingStatus = 'Duplicate',
ErrorMessage = 'Duplicate within import batch'
WHERE QualityTestResultId IN (
SELECT QualityTestResultId
FROM QualityTestResultStaging
GROUP BY QualityTestResultId
HAVING COUNT(*) > 1
);
END;
4. Complete Import Workflow:
a) Truncate staging table
b) Bulk insert external data into staging
c) Run sp_ValidateQualityImportConstraints
d) Execute MERGE for records with ProcessingStatus = ‘Pending’
e) Update staging records with results
f) Generate import summary report
5. Error Handling and Logging:
Create a comprehensive logging table:
CREATE TABLE QualityImportLog (
LogId BIGINT IDENTITY PRIMARY KEY,
ImportDateTime DATETIME,
RecordsProcessed INT,
RecordsInserted INT,
RecordsUpdated INT,
RecordsSkipped INT,
ErrorCount INT,
ExecutionTimeSeconds INT
);
6. Best Practices:
- Run imports during low-usage periods to minimize lock contention
- Use batch sizes of 1000 records maximum for MERGE operations
- Implement retry logic with exponential backoff for deadlock scenarios
- Archive staging data for 30 days for troubleshooting
- Set up alerts for import failures or high error rates
- Document your QualityTestResultId format to prevent ID conflicts
This approach eliminates duplicate key errors by properly handling all scenarios before attempting database inserts. The staging table pattern gives you full visibility into what’s being imported and why certain records fail, making troubleshooting much easier.