Migrated lease contracts from legacy system with automated data quality validation

I want to share our experience migrating 8,500 lease contracts from a legacy system to Odoo 15. The project took three months and involved complex data transformations, quality validation, and reconciliation. Hopefully this helps others tackling similar migrations.

Our company manages commercial real estate leases and the legacy system was a custom-built application from the early 2000s with inconsistent data quality. We had duplicate contracts, missing payment schedules, incorrect tenant information, and inconsistent date formats. About 15% of contracts had data quality issues that would have caused failures in Odoo’s validation rules.

The key to success was building a comprehensive ETL pipeline with automated data quality checks at each stage. We used Python with pandas for extraction and transformation, implemented validation rules that caught issues before loading into Odoo, and created detailed reconciliation reports comparing legacy vs migrated data.

The most challenging aspect was handling payment schedules - the legacy system stored them as free-text notes in many cases, requiring manual cleanup and restructuring into Odoo’s invoice line format. We also had to map legacy contract statuses to Odoo’s state workflow, which didn’t always align perfectly.

How did you handle incremental migrations? Did you migrate everything in one go or in phases? We’re debating whether to migrate by property, by contract type, or by date range.

For the payment schedules, we used a combination of regex pattern matching for common formats (monthly rent amounts, escalation percentages) and manual review for complex cases. About 60% could be extracted automatically, 40% needed human review. We built a review interface that showed the original text alongside extracted values for validation.

Happy to share the technical approach and code examples. Here’s our complete migration framework:

Phase 1: Data Extraction and Profiling

We started with comprehensive data profiling to understand quality issues:

import pandas as pd
from datetime import datetime
import re

class LeaseDataProfiler:
    def __init__(self, legacy_data):
        self.df = pd.read_csv(legacy_data)

    def profile_data_quality(self):
        report = {
            'total_contracts': len(self.df),
            'missing_values': self.df.isnull().sum().to_dict(),
            'duplicate_contract_ids': self.df['contract_id'].duplicated().sum(),
            'invalid_dates': self.check_date_validity(),
            'orphaned_records': self.check_referential_integrity()
        }
        return report

This profiling revealed that 12% of contracts had invalid date ranges, 8% had missing tenant information, and 5% were duplicates.

Phase 2: Data Transformation Pipeline

We built a multi-stage transformation pipeline with validation at each step:

class LeaseTransformer:
    def __init__(self):
        self.validation_errors = []
        self.warnings = []

    def transform_contract(self, legacy_contract):
        # Stage 1: Basic field mapping
        odoo_contract = {
            'name': self.clean_contract_name(legacy_contract['contract_num']),
            'partner_id': self.map_tenant_id(legacy_contract['tenant_id']),
            'date_start': self.parse_date(legacy_contract['start_date']),
            'date_end': self.parse_date(legacy_contract['end_date']),
            'recurring_amount': self.extract_rent_amount(legacy_contract['rent_notes'])
        }

        # Stage 2: Validation
        self.validate_contract(odoo_contract, legacy_contract['contract_id'])

        # Stage 3: Payment schedule extraction
        if legacy_contract['payment_schedule_notes']:
            odoo_contract['invoice_lines'] = self.extract_payment_schedule(
                legacy_contract['payment_schedule_notes']
            )

        return odoo_contract

    def extract_payment_schedule(self, notes_text):
        # Pattern matching for common formats
        patterns = [
            r'\$([\d,]+\.\d{2})\s*per\s*month',
            r'monthly rent\s*:\s*\$([\d,]+\.\d{2})',
            r'([\d,]+\.\d{2})\s*monthly'
        ]

        for pattern in patterns:
            match = re.search(pattern, notes_text, re.IGNORECASE)
            if match:
                amount = float(match.group(1).replace(',', ''))
                return self.generate_invoice_lines(amount)

        # If no pattern matches, flag for manual review
        self.warnings.append({
            'type': 'payment_schedule_extraction',
            'message': f'Could not extract payment schedule from: {notes_text[:100]}'
        })
        return None

Phase 3: Multi-Level Validation

We implemented three validation levels to catch issues without being overly restrictive:

class ContractValidator:
    def validate(self, contract, contract_id):
        errors = []
        warnings = []

        # BLOCKING ERRORS - Must be fixed before migration
        if not contract.get('name'):
            errors.append(f"Contract {contract_id}: Missing contract name")

        if not contract.get('partner_id'):
            errors.append(f"Contract {contract_id}: Missing tenant reference")

        if contract.get('date_start') and contract.get('date_end'):
            if contract['date_end'] < contract['date_start']:
                errors.append(f"Contract {contract_id}: End date before start date")

        # WARNINGS - Flag for review but don't block
        if contract.get('recurring_amount', 0) < 100:
            warnings.append(f"Contract {contract_id}: Unusually low rent amount")

        if contract.get('date_start'):
            lease_duration = (contract['date_end'] - contract['date_start']).days
            if lease_duration < 30 or lease_duration > 3650:
                warnings.append(f"Contract {contract_id}: Unusual lease duration")

        return {'errors': errors, 'warnings': warnings}

Phase 4: Reconciliation and Quality Assurance

After migration, we ran comprehensive reconciliation comparing legacy vs Odoo data:

class MigrationReconciliation:
    def reconcile(self, legacy_df, odoo_df):
        reconciliation_report = []

        # Check record counts
        if len(legacy_df) != len(odoo_df):
            reconciliation_report.append({
                'check': 'record_count',
                'status': 'FAIL',
                'legacy_count': len(legacy_df),
                'odoo_count': len(odoo_df)
            })

        # Check financial totals
        legacy_total = legacy_df['monthly_rent'].sum()
        odoo_total = odoo_df['recurring_amount'].sum()
        variance = abs(legacy_total - odoo_total)

        reconciliation_report.append({
            'check': 'total_monthly_rent',
            'status': 'PASS' if variance < 100 else 'FAIL',
            'legacy_total': legacy_total,
            'odoo_total': odoo_total,
            'variance': variance
        })

        return pd.DataFrame(reconciliation_report)

Migration Strategy

We migrated in phases by property to minimize risk:

  • Week 1-2: Pilot migration (200 contracts, 1 property) - Full testing and validation
  • Week 3-6: Phase 1 (2,000 contracts, 5 properties) - Monitor issues and refine
  • Week 7-10: Phase 2 (6,300 contracts, remaining properties) - Full migration

Between phases, we ran parallel systems and reconciled daily to catch any discrepancies.

Key Success Factors:

  1. Comprehensive data profiling upfront revealed 85% of issues before writing code
  2. Three-level validation (errors/warnings/info) prevented blocking valid edge cases
  3. Automated reconciliation caught discrepancies immediately
  4. Phased approach allowed learning and refinement between batches
  5. Clear rollback procedures for each phase gave confidence to proceed

Final Results:

  • 8,500 contracts migrated with 99.7% accuracy
  • 127 contracts required manual intervention (1.5%)
  • Average reconciliation variance: $0.23 per contract
  • Zero production incidents post-migration
  • 6 months later: All contracts validated, no legacy system dependencies

The code framework is reusable for other entity migrations. We’ve since used modified versions for migrating equipment leases and service contracts with similar success rates.

What validation rules did you implement? We’re struggling with defining comprehensive checks that catch issues without being too restrictive and blocking valid edge cases.