diff options
Diffstat (limited to 'python')
-rw-r--r-- | python/notes/overview.md | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/python/notes/overview.md b/python/notes/overview.md new file mode 100644 index 0000000..27b9177 --- /dev/null +++ b/python/notes/overview.md @@ -0,0 +1,20 @@ +# Generic data processing approach + +## A basic setup + +* Python for orchestration; data deps; structured outputs; multistage pipelines + with inspectable intermediate results +* Go for custom, fast tools, when needed + +## Quick Fusion + +For data fusion, e.g. merging OL works and editions to get a joint dataset: + +* select and tabularize data +* a one-off merge script using, e.g. Pandas + +## Hadoop Scripting + +* Pig Latin +* PySpark + |