官网网址:http://kafka.apache.org/intro#intro_more
event streaming(事件流)

Event streaming is the digital equivalent of the human body’s central nervous system. It is the technological
foundation for the ‘always-on’ world where businesses are increasingly software-defined and automated,
and where the user of software is more software.

Technically speaking, event streaming is the practice of capturing data in real-time from event sources like
databases, sensors, mobile devices, cloud services, and software applications in the form of streams of
events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the
event streams in real-time as well as retrospectively; and routing the event streams to different destination
technologies as needed. Event streaming thus ensures a continuous flow and interpretation of data so
that the right information is at the right place, at the right time.

事件流:从技术上讲,事件流是一种以事件流的形式从事件源(如数据库,传感器,移动设备,云服务和软件
应用程序)实时捕获数据的实践。 持久存储这些事件流以供以后检索; 实时以及回顾性地处理,处理和响应
事件流; 并根据需要将事件流路由到不同的目标技术。 事件流因此确保了数据的连续流和解释,以便正确的
信息在正确的时间,正确的位置。

What can I use event streaming for?(能用事件流做些什么)

Event streaming is applied to a wide variety of use cases across a plethora of industries and organizations. Its many examples include:
To process payments and financial transactions in real-time, such as in stock exchanges, banks, and insurances.
To track and monitor cars, trucks, fleets, and shipments in real-time, such as in logistics and the automotive industry.
To continuously capture and analyze sensor data from IoT devices or other equipment, such as in factories and wind parks.
To collect and immediately react to customer interactions and orders, such as in retail, the hotel and travel industry, and mobile applications.
To monitor patients in hospital care and predict changes in condition to ensure timely treatment in emergencies.
To connect, store, and make available data produced by different divisions of a company.
To serve as the foundation for data platforms, event-driven architectures, and microservices.

事件流适用于众多行业和组织的各种用例。 它的许多示例包括:
实时处理付款和金融交易,例如在证券交易所,银行和保险中。
实时跟踪和监视汽车,卡车,车队和货运,例如在物流和汽车行业。
连续捕获和分析来自IoT设备或其他设备(例如工厂和风电场)中的传感器数据。
收集并立即响应客户的交互和订单,例如在零售,酒店和旅游行业以及移动应用程序中。
监测患者的医院护理情况并预测病情变化,以确保在紧急情况下及时得到治疗。
连接,存储和提供公司不同部门产生的数据。
用作数据平台,事件驱动的体系结构和微服务的基础。

Apache Kafka® is an event streaming platform. What does that mean?(ApacheKafka®是事件流平台。 那是什么意思?)

Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:

To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
To store streams of events durably and reliably for as long as you want.
To process streams of events as they occur or retrospectively.
And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner. Kafka can be deployed on bare-metal hardware, virtual machines, and containers, and on-premises as well as in the cloud. You can choose between self-managing your Kafka environments and using fully managed services offered by a variety of vendors.

Kafka结合了三个关键功能,因此您可以使用一个经过实际测试的解决方案来端到端实施事件流的用例:

发布(写入)和订阅(读取)事件流,包括从其他系统连续导入/导出数据。
根据需要持久而可靠地存储事件流。
处理事件流的发生或追溯

并且以分布式,高度可伸缩,弹性,容错和安全的方式提供所有这些功能。 Kafka可以部署在裸机硬件,虚拟机和容器,本地以及云中。 您可以在自我管理Kafka环境与使用各种供应商提供的完全托管服务之间进行选择。

How does Kafka work in a nutshell?(概述kafka是如何工作的)

Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.

Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the brokers. Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters.To let you implement mission-critical use cases, a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.

Clients: They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures. Kafka ships with some such clients included, which are augmented by dozens of clients provided by the Kafka community: clients are available for Java and Scala including the higher-level Kafka Streams library, for Go, Python, C/C++, and many other programming languages as well as REST APIs.

Kafka是一个分布式系统,由服务器和客户端组成,通过高性能的TCP网络协议进行通信。它可以部署在裸机硬件、虚拟机、内置容器以及云环境中

服务器:Kafka作为一台或多台服务器的集群运行,可以跨越多个数据中心或云区域。其中一些服务器构成了存储层,称为brokers。其他服务器运行Kafka Connect来连续导入和导出数据作为事件流(event streams),以将Kafka与现有系统集成在一起,例如关系数据库以及其他Kafka群集。为了实现关键任务用例,Kafka集群具有高度的可扩展性和容错能力:如果其任何服务器发生故障,其他服务器将接管其工作,以确保连续运行而不会丢失任何数据

客户端:它们使您可以编写分布式应用程序和微服务,即使在网络问题或机器故障的情况下,它们也可以并行,大规模且以容错的方式读取,写入和处理事件流。 Kafka附带了一些这样的客户端,Kafka社区提供了许多客户端,这些客户端得到了扩展:客户端可用于Java和Scala,包括更高级的Kafka Streams库,Go,Python,C / C ++和许多其他编程语言以及REST API

Main Concepts and Terminology(主要的概念和术语)

An event records the fact that “something happened” in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here’s an example event:

Event key: “Alice”
Event value: “Made a payment of $200 to Bob”
Event timestamp: “Jun. 25, 2020 at 2:06 p.m.”
Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events. In Kafka, producers and consumers are fully decoupled and agnostic of each other, which is a key design element to achieve the high scalability that Kafka is known for. For example, producers never need to wait for consumers. Kafka provides various guarantees such as the ability to process events exactly-once.

Events are organized and durably stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder. An example topic name could be “payments”. Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events. Events in a topic can be read as often as needed—unlike traditional messaging systems, events are not deleted after consumption. Instead, you define for how long Kafka should retain your events through a per-topic configuration setting, after which old events will be discarded. Kafka’s performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic’s partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition’s events in exactly the same order as they were written.

事件记录了世界或您的企业中“发生了某些事情”的事实。在文档中也称为record或message。当您向Kafka读取或写入数据时,您将以事件的形式进行操作。从概念上讲,事件具有键,值,时间戳和可选的元数据标题。这是一个示例事件:

事件键:“爱丽丝”
赛事价值:“向Bob支付了$ 200”
活动时间戳记:“ 2020年6月25日,下午2:06”
(Producers)生产者是那些向Kafka发布(写)事件的客户端应用程序,而(consumers)消费者是那些订阅(读和处理)这些事件的客户端应用程序在Kafka中,生产者和消费者之间完全解耦并且彼此不可知 ,这是实现Kafka众所周知的高可伸缩性的关键设计元素。例如,生产者永远不需要等待消费者。 Kafka提供各种保证,例如能够一次准确地处理事件。

event被组织并持久地存储在topic中。简单的说,topic类似于文件系统中的文件夹,event是该文件夹中的文件。示例topic名称可以是“payment”。 Kafka中的topic始终是多生产者(multi-produce)r和多用户(multi-subscriber)的:一个topic可以有零个,一个或多个向其写入事件的生产者,以及零个,一个或多个订阅这些事件的使用者。可以根据需要频繁读取topic中的事件-与传统的消息传递系统不同,kafka在使用后不会删除事件。相反,您可以通过按topic的配置设置来定义Kafka将事件保留多长时间,之后旧的事件将被丢弃。 Kafka的性能相对于数据大小实际上是恒定的,因此长时间存储数据是完全可以的。

topic是可分区的,这意味着topic分布(partition)在位于不同Kafka broker上的多个“存储桶(buckets)”中。数据的这种分布式放置对于可伸缩性非常重要,因为它允许客户端应用程序同时从多个broker读取数据或向多个broker写入数据将新事件发布到topic时,实际上会将其追加到topic的一个分区中。具有相同key(例如,客户或车辆ID)的事件将写入同一分区,并且Kafka保证,给定topic分区的任何使用者都将始终以与写入时完全相同的顺序读取该分区的事件

Figure: This example topic has four partitions P1–P4. Two different producer clients are publishing, independently from each other, new events to the topic by writing events over the network to the topic’s partitions. Events with the same key (denoted by their color in the figure) are written to the same partition. Note that both producers can write to the same partition if appropriate.
To make your data fault-tolerant and highly-available, every topic can be replicated, even across geo-regions or datacenters, so that there are always multiple brokers that have a copy of the data just in case things go wrong, you want to do maintenance on the brokers, and so on. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is performed at the level of topic-partitions.

This primer should be sufficient for an introduction. The Design section of the documentation explains Kafka’s various concepts in full detail, if you are interested.

在这里插入图片描述

图:本示例主题具有四个分区P1-P4。两个不同的生产者客户端正在彼此独立地向topic发布新event,通过网络将event写入topic分区,。具有相同键(在图中由其颜色表示)的事件被写入同一分区。请注意,如果合适,两个生产者都可以写入同一分区。
为了使您的数据具有容错性和高可用性,即使在地理区域或数据中心之间,也可以复制每个主题,因此,总是有多个broker可以有数据的副本,以防万一出错。对brokers进行维护,等等。常见的生产设置是replication factor 为3,即,您的数据将始终有三个副本。此复制在topic-partitions(主题分区)级别执行

该入门手册应该足够介绍。如果您有兴趣,文档的“设计”部分将详细介绍Kafka的各种概念。

Kafka APIs

In addition to command line tooling for management and administration tasks, Kafka has five core APIs for Java and Scala:

The Admin API to manage and inspect topics, brokers, and other Kafka objects.
The Producer API to publish (write) a stream of events to one or more Kafka topics.
The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
The Kafka Streams API to implement stream processing applications and microservices. It provides higher-level functions to process event streams, including transformations, stateful operations like aggregations and joins, windowing, processing based on event-time, and more. Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka. For example, a connector to a relational database like PostgreSQL might capture every change to a set of tables. However, in practice, you typically don’t need to implement your own connectors because the Kafka community already provides hundreds of ready-to-use connectors.
Where to go from here
To get hands-on experience with Kafka, follow the Quickstart.
To understand Kafka in more detail, read the Documentation. You also have your choice of Kafka books and academic papers.
Browse through the Use Cases to learn how other users in our world-wide community are getting value out of Kafka.
Join a local Kafka meetup group and watch talks from Kafka Summit, the main conference of the Kafka community.

Kafka API

除了用于管理和管理任务的命令行工具外,Kafka还具有用于Java和Scala的五个核心API:

Admin API,用于管理和检查topic,broker和其他Kafka对象。
生产者API(Producer API),用于将事件流发布(写入)到一个或多个Kafka主题。
消费者API订阅( Consumer API )(读取)一个或多个主题并处理为其产生的事件流。
Kafka Streams API,用于实现流处理应用程序和微服务。它提供了更高级别的功能来处理事件流,包括转换,诸如聚合(aggregations)和联接(joins)之类的有状态操作,窗口,基于事件时间的处理等等。从一个或多个主题读取输入,以便生成一个或多个主题的输出,从而有效地将输入流转换为输出流。
Kafka Connect API可以构建和运行可重用的数据导入/导出连接器,这些连接器从(到)外部系统和应用程序消耗(读取)或生成(写入)事件流,以便它们可以与Kafka集成。例如,与诸如PostgreSQL之类的关系数据库的连接器可能会捕获对一组表的所有更改。但是,实际上,您通常不需要实现自己的连接器,因为Kafka社区已经提供了数百个随时可用的连接器。

Where to go from here
要获得有关Kafka的动手经验,请遵循快速入门http://kafka.apache.org/quickstart。
要更详细地了解Kafka,请阅读文档http://kafka.apache.org/documentation/。您还可以选择Kafka的书籍和学术论文http://kafka.apache.org/books-and-papers。
浏览用例http://kafka.apache.org/powered-by,了解我们全球社区中的其他用户如何从Kafka中获得价值。
加入当地的kafka聚会小组http://kafka.apache.org/events,观看kafka社区主要会议kafka峰会的演讲https://kafka-summit.org/past-events/。

Logo

Kafka开源项目指南提供详尽教程,助开发者掌握其架构、配置和使用,实现高效数据流管理和实时处理。它高性能、可扩展,适合日志收集和实时数据处理,通过持久化保障数据安全,是企业大数据生态系统的核心。

更多推荐