1.创建kafka的topic

kafka-topics.sh --create --zookeeper 192.168.116.60:2181 --topic user_friends_raw  --partitions 1 --replication-factor 1

2.创建并编辑flume脚本

vi userFriend-flume-kafka.conf

flume脚本代码:

user_friend.sources=userFriendSource
user_friend.channels=userFriendChannel
user_friend.sinks=userFriendSink

user_friend.sources.userFriendSource.type=spooldir
user_friend.sources.userFriendSource.spoolDir=/opt/flume160/conf/jobkb09/dataSourceFile/userFriend
user_friend.sources.userFriendSource.deserializer=LINE
user_friend.sources.userFriendSource.deserializer.maxLineLength=320000
user_friend.sources.userFriendSource.includePattern=userFriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
//flume过滤
user_friend.sources.userFriendSource.interceptors=head_filter
user_friend.sources.userFriendSource.interceptors.head_filter.type=regex_filter
user_friend.sources.userFriendSource.interceptors.head_filter.regex=^user,friends*
user_friend.sources.userFriendSource.interceptors.head_filter.excludeEvents=true

user_friend.channels.userFriendChannel.type=file
user_friend.channels.userFriendChannel.checkpointDir=/opt/flume160/conf/jobkb09/checkPointFile/userFriend
user_friend.channels.userFriendChannel.dataDirs=/opt/flume160/conf/jobkb09/dataChannelFile/userFriend

user_friend.sinks.userFriendSink.type=org.apache.flume.sink.kafka.KafkaSink
user_friend.sinks.userFriendSink.batchSize=640
user_friend.sinks.userFriendSink.brokerList=192.168.116.60:9092
user_friend.sinks.userFriendSink.topic=user_friends_raw

user_friend.sources.userFriendSource.channels=userFriendChannel
user_friend.sinks.userFriendSink.channel=userFriendChannel

3.执行flume脚本

[root@hadoop001 flume160]# ./bin/flume-ng agent --name user_friend --conf ./conf/ --conf-file ./conf/jobkb09/userFriend-flume-kafka.conf -Dflume.root.logger=INFO,console

4.复制数据到被flume监控的路径

//复制数据使用install 命令也可以
二者区别:如果目标文件存在,cp会先清空文件后往里写入新文件,而install则会先删除掉原先的文件然后写入新文件。

cp user_friends.csv /opt/flume160/conf/jobkb09/dataSourceFile/userFriend/userFriend_2020-12-08.csv

这里如果报错如下,不要删除监控路径下的文件,然后再次执行一次flume脚本 就可以了

2021-01-12 19:27:20,256 (pool-4-thread-1) [ERROR - org.apache.flume.source.Spoo  lDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)] 
FAT  AL: Spool Directory source eventsSource: { spoolDir: /kb09file/events }: Uncaug  ht exception in SpoolDirectorySource thread. 
Restart or reconfigure Flume to co  ntinue processing.

5.查看队列分区信息

kafka-topics.sh --zookeeper 127.0.0.1:2181 --describe --topic user_friends_raw

5.1查看队列信息数

kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 192.168.116.60:9092 --topic user_friends_raw -time -1 --offsets 1

6.消费消息

kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic user_friends_raw --from-beginning

7.删除topic

kafka-topics.sh  --zookeeper 192.168.116.60:2181 --topic user_friends_raw --delete
Logo

Kafka开源项目指南提供详尽教程,助开发者掌握其架构、配置和使用,实现高效数据流管理和实时处理。它高性能、可扩展,适合日志收集和实时数据处理,通过持久化保障数据安全,是企业大数据生态系统的核心。

更多推荐