今天按照spark官网的例子进行stream的运用代码是很简单 但是遇到了两个编译问题让我甚是难受啊

官网给的api为

mvn注入jar包

groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_2.11
version = 2.2.0
代码部分

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "latest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

val topics = Array("topicA", "topicB")
val stream = KafkaUtils.createDirectStream[String, String](
  streamingContext,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

stream.map(record => (record.key, record.value))

一开始一直报这个错误

Error:scalac: bad symbolic reference. A signature in KafkaUtils.class refers to term serializer
in package kafka which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling KafkaUtils.class.

最初认为是这位大神写的 以为缺少jar包,但是jar怎么都搜不到啊

http://blog.csdn.net/wuguobanga/article/details/50780789#reply

后来Google了半小时终于找到原来是我的Scala-sdk 版本和引入的spark-streaming-kafka-0-10_2.11版本不兼容,原先我在IDEA中用的是2.10.6但是这个jar包需要2.11的Scala版本。将idea的Scala改了就行了


2.之后我又出现了一个错误是这样的

Error:(24, 18) Symbol 'type <none>.internal.Logging' is missing from the classpath.
This symbol is required by 'object org.apache.spark.streaming.kafka010.KafkaUtils'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'KafkaUtils.class' was compiled against an incompatible version of <none>.internal.
    val stream = KafkaUtils.createDirectStream[String, String](


找了半天才发现是,之前我用的
spark1.4版本的spark-assembly-1.4.1-hadoop2.6.0.jar来进行SparkConf的创建 

还是版本不兼容

最终将pom.xml改成这样OK了

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
我的代码如下

import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.{SparkConf, TaskContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.kafka010.{HasOffsetRanges, KafkaUtils, OffsetRange}
object SparkStreaming {

  def main(args: Array[String]): Unit = {


    val kafkaParams = Map[String, Object](
      "bootstrap.servers" -> "192.168.1.188:9092",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "use_a_separate_group_id_for_each_stream",
      "auto.offset.reset" -> "latest",
      "enable.auto.commit" -> (false: java.lang.Boolean)
    )

    val conf = new SparkConf().setAppName("Spark Streaming Test").setMaster("local[2]")
    val ssc = new StreamingContext(conf, Seconds(1))
    val topics = Array("new_get_ad")
    val stream = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topics, kafkaParams)
    )

    stream.map(record => (record.key, record.value))


  }




Logo

Kafka开源项目指南提供详尽教程,助开发者掌握其架构、配置和使用,实现高效数据流管理和实时处理。它高性能、可扩展,适合日志收集和实时数据处理,通过持久化保障数据安全,是企业大数据生态系统的核心。

更多推荐