Creating Landing Tables in the Data Warehouse
Destinations · Updated June 27, 2025
Introduction
Your landing tables are the initial point of entry for your valuable data. By following these best practices, you can ensure a smooth, efficient, and insightful data journey.
Consistent and Clear Naming Conventions
- Adopt a standardized naming convention for your landing tables. This makes it easier for everyone to understand the purpose and content of each table at a glance. Consider including prefixes or suffixes that indicate the source system or data domain.
- Use descriptive and consistent column names. Clarity here minimizes confusion during transformation and analysis.
Include Essential Metadata Columns
Beyond the core data from your source systems, incorporating specific metadata columns into your landing tables is a crucial best practice. These fields provide essential context, streamline deduplication processes, and empower effective landing table pruning.
Recommended Metadata Column Names and Reactor Mappings
Ingestion Timestamp
- Recommended column name:
ingestion_timestamp - Data type: timestamp
- Description: The timestamp when Reactor initially ingested the event that led to the creation of this record in the landing table. Useful for tracking the data pipeline timeline.
- Recommended Reactor mapping expression:
source.__reactor.event_timestamp
Is Deleted
- Recommended column name:
reactor_is_deleted - Data type: boolean
- Description: Used when processing data deletion and redaction requests, to indicate whether a record has been purged from Reactor and should be deleted from downstream destinations. For any record with
truein theis_deletedfield, this version and all prior versions of the record should be purged from downstream destinations. See The Importance of Landing Table Pruning for more information on purging deleted records. - Recommended Reactor mapping expression:
IF(source.__reactor.should_delete, TRUE, FALSE)
Is Redacted
- Recommended column name:
reactor_is_redacted - Data type: boolean
- Description: Used when processing data deletion and redaction requests, to indicate whether personally identifiable data in a record has been redacted by Reactor. For any record with
truein thehas_redactionfield, all PII field values in the record have been redacted, masked, or nulled, and those fields should not be used in downstream customer activations. - Recommended Reactor mapping expression:
IF(source.__reactor.has_redactions, TRUE, FALSE)
Loaded At
- Recommended column name:
loaded_at - Data type: timestamp
- Description: The precise timestamp when the row was mapped. Crucial for determining data recency.
- Recommended Reactor mapping expression:
CURRENT_DATETIME()
Message Version
- Recommended column name:
message_version - Data type: timestamp
- Description: An optional timestamp indicating when the record was last updated in the source system. This is a version field you can use to keep a history of an entity, akin to a Type 2 slowly-changing dimension (SCD). It is also used to determine the latest version during deduplication.
- Recommended Reactor mapping expression: Map to the source system's update timestamp
Node
- Recommended column name:
node - Data type: string
- Description: A unique identifier for the originating source system of the record. Useful for tracing data lineage and troubleshooting.
- Recommended Reactor mapping expression:
SPLIT(source.__reactor.input_event_scid,":")[2]
Scope
- Recommended column name:
scope - Data type: string
- Description: A parent-level entity identifier used to differentiate records with the same source_id that are distinct within a broader business context.
- Recommended Reactor mapping expression:
SPLIT(source.__reactor.input_event_scid,":")[3]
Source ID
- Recommended column name:
source_id - Data type: string
- Description: The unique identifier of the record as it exists in the source system. Essential for matching and deduplication.
- Recommended Reactor mapping expression: Map to the source system's unique identifier
Version
- Recommended column name:
version - Data type: integer
- Description: Indicates which version of mapping configurations was applied to the record. Essential for deduplication.
- Recommended Reactor mapping expression: Reactor auto-outputs this field as an epoch timestamp (in milliseconds) representing the latest update timestamp of any model configuration applied to the record; no mapping expression is necessary
Optimal Data Types
Choose the most appropriate data types for each column to optimize storage and query performance. For example, use integer types for numerical IDs and appropriate date and time types for date and time information.
Partitioning and Clustering (for Large Tables):
For very large landing tables, consider partitioning based on a relevant date or timestamp column (like the load timestamp or update timestamp). This can significantly improve query performance and make landing table pruning more efficient.
Within partitions, explore clustering on columns frequently used in your deduplication logic (like the unique identifier and update timestamp).
Design for Append-Only Inserts
Reinforce the understanding that Reactor appends new rows to the landing table. Avoid direct updates or deletes on the landing table itself, as this can complicate your deduplication and historical tracking strategies.
Establish Clear Deduplication Strategies
We highly recommend building a deduplication view on top of your landing table to ensure your analytical efforts are laser-focused on the most current information and avoid any data duplication headaches. Think of this view as a smart filter, ensuring that only the latest and greatest version of each record shines through, leading to more accurate and efficient data analysis.
- Define the primary key or unique identifier that will be used for deduplication for each landing table.
- Determine the column that indicates the order of updates (typically a timestamp).
- Document the specific deduplication logic implemented in your views.
See Keeping Your Data Warehouse Lean and Insightful for more details on deduplication and an example deduplication query.
Implement Robust Landing Table Pruning Policies
Over time, not all historical data retains its value. Holding onto this outdated information can lead to increased storage costs and potentially impact query performance. This is where landing table pruning comes in.
- Define clear retention policies for the data in your landing tables based on analytical, operational, or AI requirements.
- Establish a regular schedule for pruning stale data.
- Consider archiving older data in a separate storage location if it is needed for compliance or infrequent historical analysis.
- Document your pruning policies and procedures.
See Keeping Your Data Warehouse Lean and Insightful for more details on pruning and example pruning queries.
Monitoring and Alerting
Implement monitoring to track the size and growth of your landing tables. Set up alerts for unexpected data volumes or potential issues.
Documentation is Key
Maintain comprehensive documentation for each landing table, including its purpose, source system, column definitions, deduplication strategy, and pruning policy.