Building a New Data Pipeline from Scratch in Reactor's Mapping Sandbox
Mappings: Advanced Tutorials · Updated June 6, 2025
(WIP - only visible to Agents and Admins currently)
Introduction
The Reactor Data Mapping Sandbox provides a safe and isolated environment to build and test new data models before deploying them to production. This guide walks you through the process of creating a new data model from scratch.
High-Level Workflow
- Configure Sandbox and Production destination targets.
- Add a new Source Mapping model in the Sandbox.
- Map your data.
- (Optional) Add Model Mapping models to your pipeline and map your data.
- Add a Sandbox output for your new model.
- Run a mapper replay against the Sandbox pipeline.
- Validate in your destination and iterate as needed.
- Deploy models from the Sandbox to Production.
- Add a Production output for your new model.
- Run a mapper replay against the new Production pipeline.
Step-By-Step Guide
1. Configure Sandbox and Production Destination Targets
Before you begin mapping, set up your destination targets for both Sandbox testing and eventual production deployment.
-
If you have an existing destination target configured for this data source (for BigQuery or Snowflake):
- Make a copy of your existing production table in your data warehouse.
- On the Destinations page in Reactor Data, add this copied table as a Sandbox Target.
- Also, ensure the original production table is configured as a Production Target on the Destinations page.
- [Insert Screenshot: Creating a copied table for Sandbox and setting up Sandbox and Production targets for BQ/SF]
-
If you are using S3:
- Simply add a Sandbox bucket target on the Destinations page in Reactor Data.
- Add a Production bucket target as well on the Destinations page.
- [Insert Screenshot: Adding S3 Sandbox and Production bucket targets]
2. Navigate to Mappings > Sandbox and Add a New Source Mapping Model
Begin building your new model by creating a Source Mapping model in the Sandbox environment.
- In Reactor Data, navigate to Mappings > Sandbox.
- Add a new Source Mapping model that reads data from the specific data source you wish to map.
- [Insert Screenshot: Adding a new Source Mapping model in Sandbox]
3. Map the Data
Now, you'll define how the incoming source data maps to your new model's fields.
- Map the data fields from your source to the desired fields in your new model.
-
Hint: Utilize the Electron feature within Reactor Data to assist with mapping. You can ask Electron to add fields from your Sandbox target to the model and map them automatically, streamlining the process.
- [Insert Screenshot: Mapping data fields, possibly showing Electron assistance]
4. Optional: Add a Downstream Model Mapping Model
Adding a Model Mapping model downstream is highly recommended for several reasons:
-
Future Expansion: It allows you to easily incorporate other data sources into this model later. For example, if you're creating a model for digital marketing ad spend, and this is your first digital marketing channel, a Model Mapping model will simplify adding other channels in the future.
-
Derived Fields: It enables you to create derived fields in your model that can be mapped using fields from the Source Mapping model.
-
If desired, add a downstream Model Mapping model after your Source Mapping model.
- [Insert Screenshot: Adding a downstream Model Mapping model]
5. Add an Output for Your Sandbox Target
Configure your new model to output data to your designated Sandbox target.
- Add an output configuration within your model (or pipeline) that directs the processed data to your Sandbox target.
- [Insert Screenshot: Adding an output for the Sandbox target]
6. Run a Mapper Replay Against the Sandbox Pipeline
Execute a mapper replay using your Sandbox pipeline to process data through your newly configured model.
- [Insert Screenshot: Running a mapper replay against the Sandbox pipeline]
7. Validate in Your Destination and Iterate as Needed
After the mapper replay, meticulously validate the output in your Sandbox destination.
- Verify the accuracy, completeness, and structure of the data.
- If any adjustments to your mappings or model logic are required, update them within the Sandbox environment and run another mapper replay. Continue this validation and iteration until the output perfectly aligns with your expectations.
- [Insert Screenshot: Validating data in the Sandbox destination]
8. Deploy Models to Production
Once your new model functions correctly and produces the desired output in the Sandbox, you can promote it to your production environment.
- Within the Sandbox environment, deploy the models you edited during this process. This action pushes your validated new model to Production.
- [Insert Screenshot: Deploying models from Sandbox to Production]
9. Add Your Production Output
Now that your model is in Production, configure its output to your production destination.
- In the Production environment, add the appropriate production output configuration for your new model, directing data to your designated production target.
- [Insert Screenshot: Adding production output]
10. Run a Mapper Replay Against the New Production Pipeline
Finally, run a mapper replay against your new Production pipeline to ensure the model is live and functioning as expected in your production environment, processing real data.
Demonstration Video:
We will soon provide a video demonstration of this workflow. Check back here for a comprehensive walkthrough of updating an existing data pipeline using the Reactor Data Sandbox.