Spark reference table -


a spark rdd contains collection, each element represents request.

a scala function passed rdd, and, each rdd element, function create modified request.

for each collection element\request, lookup table needs referenced. maximum size of reference table 200 rows.

how performance , scalability, how should lookup table (which used within function) modeled?

  1. spark broadcast variable.
  2. separate spark rdd.
  3. scala immutable collection.

perhaps there option have not considered.

thanks

it depends on size of rdds, giving reference table have 200 rows, think best option use broadcast variable.

if used separate rdd, make spark repartition request rdds , making innecesary shuffle.


Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -