apache kafka - Spark Streaming job behaves like a batch -
i'm trying run spark streaming job read messages kafka. testing insert kafka couple of gigabytes of messages , launch streaming job (python).
from spark import sparkcontext, sparkconf pyspark.streaming import streamingcontext conf = sparkconf() conf.setmaster("spark://master1:7077,master2:7077,master3:7077") sc = sparkcontext(conf=conf) ssc = streamingcontext(sc, 1) pyspark.streaming.kafka import kafkautils directkafkastream = kafkautils.createdirectstream(ssc, ["test_topic"], {"metadata.broker.list": "kafka1:9092,kafka2:9092,kafka3:9092", "auto.offset.reset": "smallest"}) directkafkastream.count().pprint() ssc.start() ssc.awaittermination()
i expect job update me every second messages counted far (streaming). job counts messages in queue (whatever time takes) , after messages counted, see update (batch).
please, tell me have missed here.
update: expecting see:
------------------------------------------- time: 2016-01-28 16:23:42 ------------------------------------------- 1000000 ------------------------------------------- time: 2016-01-28 16:23:43 ------------------------------------------- 10000000
what i'm seeing:
16/01/28 16:25:10 info jobscheduler: added jobs time 1453994710000 ms 16/01/28 16:25:11 info jobscheduler: added jobs time 1453994711000 ms 16/01/28 16:25:12 info jobscheduler: added jobs time 1453994712000 ms 16/01/28 16:25:13 info jobscheduler: added jobs time 1453994713000 ms 16/01/28 16:25:14 info jobscheduler: added jobs time 1453994714000 ms 16/01/28 16:25:15 info jobscheduler: added jobs time 1453994715000 ms 16/01/28 16:25:16 info jobscheduler: added jobs time 1453994716000 ms ...
Comments
Post a Comment