From 1aa8471c5ba39b8472ca02c4f18d7b81a118752b Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Mon, 24 May 2021 16:35:37 +0200 Subject: overview notes --- python/notes/overview.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 python/notes/overview.md diff --git a/python/notes/overview.md b/python/notes/overview.md new file mode 100644 index 0000000..27b9177 --- /dev/null +++ b/python/notes/overview.md @@ -0,0 +1,20 @@ +# Generic data processing approach + +## A basic setup + +* Python for orchestration; data deps; structured outputs; multistage pipelines + with inspectable intermediate results +* Go for custom, fast tools, when needed + +## Quick Fusion + +For data fusion, e.g. merging OL works and editions to get a joint dataset: + +* select and tabularize data +* a one-off merge script using, e.g. Pandas + +## Hadoop Scripting + +* Pig Latin +* PySpark + -- cgit v1.2.3