关于Spark-Streaming官方示例:
https://github.com/apache/spark/tree/master/examples

本文采用kafka作为spark输入源
运行时出现以下日志:

18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722117000 ms
18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722118000 ms
18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722119000 ms
18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722120000 ms
18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722121000 ms
18/09/12 11:15:28 INFO JobScheduler: Added jobs for time 1536722122000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722123000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722124000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722125000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722126000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722127000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722128000 ms
18/09/12 11:15:29 INFO JobScheduler: Added jobs for time 1536722129000 ms
18/09/12 11:15:30 INFO JobScheduler: Added jobs for time 1536722130000 ms
18/09/12 11:15:31 INFO JobScheduler: Added jobs for time 1536722131000 ms
18/09/12 11:15:32 INFO JobScheduler: Added jobs for time 1536722132000 ms
18/09/12 11:15:33 INFO JobScheduler: Added jobs for time 1536722133000 ms
18/09/12 11:15:34 INFO JobScheduler: Added jobs for time 1536722134000 ms
18/09/12 11:15:35 INFO JobScheduler: Added jobs for time 1536722135000 ms
18/09/12 11:15:36 INFO JobScheduler: Added jobs for time 1536722136000 ms
18/09/12 11:15:37 INFO JobScheduler: Added jobs for time 1536722137000 ms
18/09/12 11:15:38 INFO JobScheduler: Added jobs for time 1536722138000 ms
18/09/12 11:15:39 INFO JobScheduler: Added jobs for time 1536722139000 ms
18/09/12 11:15:40 INFO JobScheduler: Added jobs for time 1536722140000 ms
18/09/12 11:15:41 INFO JobScheduler: Added jobs for time 1536722141000 ms
18/09/12 11:15:42 INFO JobScheduler: Added jobs for time 1536722142000 ms
18/09/12 11:15:43 INFO JobScheduler: Added jobs for time 1536722143000 ms
18/09/12 11:15:44 INFO JobScheduler: Added jobs for time 1536722144000 ms
18/09/12 11:15:45 INFO JobScheduler: Added jobs for time 1536722145000 ms

很显然这并非正常日志。查看kafka端消费正常后,确认是spark的问题。最后在官网看到一段话:
这里写图片描述
简单来说就是如果是本地运行,指定master不要指定local或local[1],应该设置为local[n],n>接收器数量。
如果是集群模式运行,分配给Spark Streaming的核心数量必须大于接收者的数量。否则,spark就只能接受数据,无法处理数据了。


更多文章关注公众号
![在这里插入图片描述](https://img-blog.csdnimg.cn/20210325093921176.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3hpYW95dV9CRA==,size_16,color_FFFFFF,t_70

更多:Spark专栏
——————————————————————————————————
作者:桃花惜春风
转载请标明出处,原文地址:
https://blog.csdn.net/xiaoyu_BD/article/details/82688001
如果感觉本文对您有帮助,您的支持是我坚持写作最大的动力,谢谢!

Logo

Kafka开源项目指南提供详尽教程,助开发者掌握其架构、配置和使用,实现高效数据流管理和实时处理。它高性能、可扩展,适合日志收集和实时数据处理,通过持久化保障数据安全,是企业大数据生态系统的核心。

更多推荐