# Generic data processing approach

## A basic setup

* Python for orchestration; data deps; structured outputs; multistage pipelines
  with inspectable intermediate results
* Go for custom, fast tools, when needed

## Quick Fusion

For data fusion, e.g. merging OL works and editions to get a joint dataset:

* select and tabularize data
* a one-off merge script using, e.g. Pandas

## Hadoop Scripting

* Pig Latin
* PySpark