Spark reference table -
a spark rdd contains collection, each element represents request.
a scala function passed rdd, and, each rdd element, function create modified request.
for each collection element\request, lookup table needs referenced. maximum size of reference table 200 rows.
how performance , scalability, how should lookup table (which used within function) modeled?
- spark broadcast variable.
- separate spark rdd.
- scala immutable collection.
perhaps there option have not considered.
thanks
it depends on size of rdds, giving reference table have 200 rows, think best option use broadcast variable.
if used separate rdd, make spark repartition request rdds , making innecesary shuffle.
Comments
Post a Comment