# Generic data processing approach ## A basic setup * Python for orchestration; data deps; structured outputs; multistage pipelines with inspectable intermediate results * Go for custom, fast tools, when needed ## Quick Fusion For data fusion, e.g. merging OL works and editions to get a joint dataset: * select and tabularize data * a one-off merge script using, e.g. Pandas ## Hadoop Scripting * Pig Latin * PySpark