Logstash7.4实现Kafka消息、Beats、MySQL的数据收集、解析、转换和ElasticSearch存储的应用场景
ElasticSearch是个是一个分布式、可扩展、实时的搜索与数据分析引擎,如何将海量数据源高效可靠的写入到ElasticSearch是个无法避免的Logstash概念与原理Logstash 是开源的服务器端数据处理管道,能够同时从多个来源动态地采集、转换和传输数据到ElasticSearch的索引中,进而对数据进行分词、检索与分析,不受格式或复杂度的影响,它提供了丰富的过滤器库,如能利...
ElasticSearch是个是一个分布式、可扩展、实时的搜索与数据分析引擎,如何将海量数据源高效可靠的写入到ElasticSearch是个无法避免的
Logstash概念与原理
Logstash 是开源的服务器端数据处理管道,能够同时从多个来源动态地采集、转换和传输数据到ElasticSearch的索引中,进而对数据进行分词、检索与分析,不受格式或复杂度的影响,它提供了丰富的过滤器库,如能利用 Grok 从非结构化数据中派生出结构,从 IP 地址解码出地理坐标,匿名化或排除敏感字段,并简化整体处理过程
Logstash应用场景
1、Logstash直接作为客户端数据源收集器,对数据进行解析转换和存储(Logstash较为重量级,消耗资源较多)
2、通过Beats收集客户端数据,Logstash对Beats的数据进行进一步收集、分析和转换
3、订阅Kaka消息,对数据进行解析、转换
解决方案:
1、数据源(MySQL数据,)——Logstash——输出(输出到ElasticSearch、文件、kafka、Redis…)
2、数据源——Beats(如FileBeats)——Logstash——输出
3、数据源——Beats——Kafka(Redis)——Logstash——输出
4、Kafia(Redis)——Logstash——输出
Logstash实现kafka消息订阅、解析与ElasticSearch存储
Logstash实现FileBeat数据收集、清洗与ElasticSearch存储
Logstash实现MySQL数据收集、解析与ElasticSearch存储
Logstash的过滤器插件库
Plugin | Description | Github repository |
Aggregates information from several events originating with a single task | ||
Performs general alterations to fields that the | ||
Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes | ||
Checks IP addresses against a list of network blocks | ||
Applies or removes a cipher to an event | ||
Duplicates events | ||
Parses comma-separated value data into individual fields | ||
Parses dates from fields to use as the Logstash timestamp for an event | ||
Computationally expensive filter that removes dots from a field name | ||
Extracts unstructured event data into fields using delimiters | ||
Performs a standard or reverse DNS lookup | ||
Drops all events | ||
Calculates the elapsed time between a pair of events | ||
Copies fields from previous log events in Elasticsearch to current events | ||
Stores environment variables as metadata sub-fields | ||
Extracts numbers from a string | ||
Fingerprints fields by replacing values with a consistent hash | ||
Adds geographical information about an IP address | ||
Parses unstructured event data into fields | ||
Provides integration with external web services/REST APIs | ||
Removes special characters from a field | ||
Generates a UUID and adds it to each processed event | ||
Enriches events with data pre-loaded from a remote database | ||
Enrich events with your database data | ||
Parses JSON events | ||
Serializes a field to JSON | ||
Parses key-value pairs | ||
Provides integration with external data in Memcached | ||
Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric | ||
Aggregates metrics | ||
Performs mutations on fields | ||
Prunes event data based on a list of fields to blacklist or whitelist | ||
Checks that specified fields stay within given size or length limits | ||
Executes arbitrary Ruby code | ||
Sleeps for a specified time span | ||
Splits multi-line messages into distinct events | ||
Parses the | ||
Enriches security logs with information about the attacker’s intent | ||
Throttles the number of events | ||
Replaces the contents of the default message field with whatever you specify in the configuration | ||
Replaces field contents based on a hash or YAML file | ||
Truncates fields longer than a given length | ||
Decodes URL-encoded fields | ||
Parses user agent strings into fields | ||
Adds a UUID to events | ||
Parses XML into fields |
grok,能通过正则解析和结构化任何文本,Grok 目前是Logstash最好的方式对非结构化日志数据解析成结构化和可查询化。此外,Logstash还可以重命名、删除、替换和修改事件字段,当然也包括完全丢弃事件,如debug事件。还有很多的复杂功能可供选择,
Flume侧重数据的传输,使用者需非常清楚整个数据的路由,相对来说其更可靠,channel是用于持久化目的的,数据必须确认传输到下一个目的地,才会删除;
Logstash侧重数据的预处理,日志字段经过预处理之后再进行解析
更多推荐
所有评论(0)