连接kafka有超时设置

If you want to enable your organization to leverage the full value of event-driven architectures, it is not enough to just add Kafka to the existing enterprise technology mix and wait for people to join the party. Experience shows some preparation is in order.

如果您想让您的组织充分利用事件驱动架构的全部价值,仅将Kafka添加到现有的企业技术组合中并等待人们加入聚会是不够的。 经验表明,要做好一些准备。

by Matthias Rüedlinger

通过MatthiasRüedlinger

鸡鸡蛋问题 (The Chicken and Egg Problem)

The value of Kafka for other teams rises with the growing number of data streams offered. To get more people to use Kafka we must simplify the data consumption and production. It’s a chicken and egg problem. We do not get more people using Kafka if there is no data. The main objective is to convince data producers in enterprises to publish their data as real-time events in Kafka. One reason these events do not exist can be that they do not have the Kafka knowledge or the system they run does not provide the Kafka connectivity out of the box.

随着提供的数据流数量的增加,Kafka对其他团队的价值不断提高。 为了使更多的人使用Kafka,我们必须简化数据消耗和生产。 这是鸡和鸡蛋的问题。 如果没有数据,我们将不会有更多的人使用Kafka。 主要目的是说服企业中的数据生产者将其数据发布为Kafka中的实时事件。 这些事件不存在的原因之一可能是它们不具备Kafka知识,或者所运行的系统没有提供开箱即用的Kafka连接性。

To get to the tipping point where Kafka is fully used in the enterprise we need to convince the critical mass to use and learn Kafka. So you need to make it as simple as possible to add and consume event streams from or to external systems like databases, document stores, S3 or whatever data source you might be using in your enterprise. We need some kind of training-wheels for Kafka, where teams that are not yet fully Kafka-savvy can learn and get some experience with Kafka and real-time events from these external systems.

为了达到在企业中完全使用Kafka的转折点,我们需要说服关键人群使用和学习Kafka。 因此,您需要使其尽可能简单,以便在数据库,数据库,文档存储,S3或企业中可能使用的任何数据源等外部系统之间添加和使用事件流。 我们需要某种针对Kafka的训练轮,那些尚未完全具备Kafka知识的团队可以从这些外部系统中学习和获得有关Kafka和实时事件的经验。

Apache连接到救援 (Apache Connect to the Rescue)

Apache Kafka Connect is a framework for connecting Kafka with external systems. With Kafka Connect we have connectors allowing us to bring data into or out of Kafka in a standardized and reliable way from different data sources. A connector itself is just a JAR file that defines how to integrate with that external system. The connector itself can then be configured over a REST API which is provided by Kafka Connect. With these connectors, we have standardization of how data is produced and consumed from these external systems. Connect can be run in a standalone or distributed mode. In distributed mode, Kafka Connect will store the metadata (connector configuration, offsets, etc) in Kafka. The standalone mode is great for trying things out, but not meant to run in production. So when you consider running Kafka Connect the way to go is to run it in the distributed mode which provides scalability and automatic fault tolerance out of the box.

Apache Kafka Connect是用于将Kafka与外部系统连接的框架。 有了Kafka Connect,我们有了连接器,使我们能够以标准化,可靠的方式从不同数据源将数据传入或传出Kafka。 连接器本身只是一个JAR文件,它定义了如何与该外部系统集成。 然后可以通过Kafka Connect提供的REST API来配置连接器本身。 使用这些连接器,我们可以标准化如何从这些外部系统生成和使用数据。 Connect可以独立或分布式模式运行。 在分布式模式下,Kafka Connect将元数据(连接器配置,偏移量等)存储在Kafka中。 独立模式非常适合尝试,但并不意味着可以在生产环境中运行。 因此,当您考虑运行Kafka Connect时,最好的方法是在分布式模式下运行它,以提供现成的可伸缩性和自动容错能力。

The connector itself can be a sink or source connector. Sink connectors write data from Kafka to a specific system and source connectors bring data from these systems to Kafka. Kafka Connect also supports different Converters which handles the serialization and deserialization of different formats like JSON Schema, Avro and Protobuf.

连接器本身可以是接收器或源连接器。 接收器连接器将数据从Kafka写入特定系统,而源连接器将数据从这些系统导入Kafka。 Kafka Connect还支持不同的Converters ,用于处理JSON Schema,Avro和Protobuf等不同格式的序列化和反序列化。

Image for post

There is also support for some transformations before the data is written to Kafka or the external systems. These transformations are called Single Message Transformation (SMT) and as the name suggests the transformation can only be applied on a message. They are very useful when the sink or source format can not be modified and you want to add, remove or rename some fields in the message. When you want to do complex transformations, like combining or splitting messages, Kafka Connect is not the right tool and you would have a look at Kafka Streams.

在将数据写入Kafka或外部系统之前,还支持某些转换。 这些转换称为“单消息转换(SMT)” ,顾名思义,该转换只能应用于消息。 当无法修改接收器或源格式,并且您想要添加,删除或重命名消息中的某些字段时,它们非常有用。 当您想进行复杂的转换(例如合并或拆分消息)时,Kafka Connect不是正确的工具,您可以看看Kafka Streams。

There are already a lot of connectors available as commercial or open-source licenses for different systems. If you don’t find a connector that suits your needs you always have the possibility to write a connector yourself in Java. The nice thing about this is it is not really that complicated for people who are used to developing software applications in Java.

已经有很多连接器可以作为针对不同系统的商业或开源许可证。 如果找不到适合您需要的连接器,则总是可以用Java自己编写连接器。 这样做的好处是,对于那些习惯于用Java开发软件应用程序的人来说,实际上并没有那么复杂。

自助数据消费和生产 (Self-Service Data Consumption and Production)

In the current IT-landscape we have moved the past years from a monolith architecture to a distributed Microservice architecture where teams have full responsibility for their applications. This means “you build it and you run it”, better known as DevOps. With Kafka Connect we have a centralized component that you can see as infrastructure which is shared by multiple teams.

在当前的IT环境中,过去几年中,我们已经从整体架构转移到分布式微服务架构,在该架构中,团队对应用程序负有全部责任。 这意味着“您构建并运行它”,即众所周知的DevOps。 使用Kafka Connect,我们有一个集中化的组件,您可以将其视为基础结构,由多个团队共享。

In our case, to enable teams we went to the conclusion to look at Kafka Connect as a Microservice which is run by the teams themselves for a specific purpose. For example, a data warehouse team would run their own Kafka Connect instance to load Kafka events into their staging area. One reason we think teams should run their own Kafka Connect is you have clear boundaries who is responsible when you receive alerts, have failed deployments or errors.

在我们的案例中,为了使团队能够工作,我们得出结论,将Kafka Connect作为一种微服务,由团队自己为特定目的运行。 例如,数据仓库团队将运行他们自己的Kafka Connect实例,以将Kafka事件加载到其暂存区域中。 我们认为团队应该运行自己的Kafka Connect的原因之一是您有明确的界限,当您收到警报,部署失败或错误时由谁负责。

But with this approach, you need an infrastructure Team which provides the tooling for monitoring and the lifecycle management so that the DevOps teams can easily set up and run their own Kafka Connect instance. The goal must be that the DevOps teams run a productive Kafka Connect within hours and they have a high level of automation for deploying connectors and upgrading to new Kafka Connect versions.

但是,使用这种方法,您需要一个基础架构团队,该团队提供用于监视和生命周期管理的工具,以便DevOps团队可以轻松地设置和运行自己的Kafka Connect实例。 目标必须是DevOps团队可以在数小时内运行高效的Kafka Connect,并且他们具有高度的自动化水平,可以部署连接器并升级到新的Kafka Connect版本。

结论 (Conclusion)

Kafka Connect is a great enabler for teams to integrate external systems with Kafka. Kafka Connect allowed us to solve repeating integration problems in a standardized way which is reliable and fault-tolerant. Once our team had some experience with a specific connector the integration with the same type of connector was done very quickly.

Kafka Connect是团队将外部系统与Kafka集成的强大推动力。 Kafka Connect使我们能够以可靠且容错的标准化方式解决重复集成问题。 一旦我们的团队对特定的连接器有了一定的经验,就可以很快完成与相同类型连接器的集成。

As a Java Team, we also had good experience writing connectors ourselves. The main reason was the system we had to integrate was very specific and there was no existing solution to our problem. The Connect Java API, which is part of Kafka, is straightforward and was quite easy to write our own connector and transformations. The tutorials you find online gave us a good start, but I would recommend you have a look at the source code of some of these connectors in GitHub to get some inspiration on how other connectors or transformations were implemented.

作为Java团队,我们还拥有自己编写连接器的丰富经验。 主要原因是我们必须集成的系统非常具体,并且没有针对我们问题的现有解决方案。 作为Kafka一部分的Connect Java API很简单,并且很容易编写我们自己的连接器和转换。 您在网上找到的教程为我们提供了一个良好的开端,但是我建议您看一下GitHub中某些连接器的源代码,以获取有关如何实现其他连接器或转换的一些启发。

It would be nice to hear what you think about the centralized vs. decentralized approach in running Apache Kafka Connect or in general what is your experience with Kafka Connect.

很高兴听到您对运行Apache Kafka Connect的集中式与分散式方法的看法,或者总体上您对Kafka Connect的经验。

Reach out to us here in the comments, through www.agoora.com, or through Twitter.

通过www.agoora.comTwitter在评论中与我们联系。

翻译自: https://medium.com/swlh/ready-steady-connect-help-your-organization-to-appreciate-kafka-8ed6cfcfd6d8

连接kafka有超时设置

Logo

Kafka开源项目指南提供详尽教程,助开发者掌握其架构、配置和使用,实现高效数据流管理和实时处理。它高性能、可扩展,适合日志收集和实时数据处理,通过持久化保障数据安全,是企业大数据生态系统的核心。

更多推荐