From 8dc84535b1880339a974a17326eec791bb1a2337 Mon Sep 17 00:00:00 2001 From: Mario Molina Date: Sat, 25 Mar 2017 14:27:08 -0600 Subject: [PATCH] Updating README --- README.md | 112 ++++++------------------------------------------------ 1 file changed, 12 insertions(+), 100 deletions(-) diff --git a/README.md b/README.md index 12de5fb..76d3961 100644 --- a/README.md +++ b/README.md @@ -1,113 +1,25 @@ # Kafka Connect FileSystem Connector [![Build Status](https://travis-ci.org/mmolimar/kafka-connect-fs.svg?branch=master)](https://travis-ci.org/mmolimar/kafka-connect-fs)[![Coverage Status](https://coveralls.io/repos/github/mmolimar/kafka-connect-fs/badge.svg?branch=master)](https://coveralls.io/github/mmolimar/kafka-connect-fs?branch=master) -Kafka Connect FileSystem is a Source Connector for reading data from any file system which implements -``org.apache.hadoop.fs.FileSystem`` class from [Hadoop-Common](https://github.com/apache/hadoop-common) and writing to Kafka. +**kafka-connect-fs** is a [Kafka Connector](http://kafka.apache.org/documentation.html#connect) +for reading records from files in the file systems specified and load them into Kafka. -## Prerequisites +Documentation for this connector can be found [here](http://kafka-connect-fs.readthedocs.io/). -- Confluent 3.1.1 -- Java 8 +## Development -## Getting started +To build a development version you'll need a recent version of Kafka. You can build +kafka-connect-fs with Maven using the standard lifecycle phases. -### Building source ### - mvn clean package +## FAQ -### Config the connector ### - name=FsSourceConnector - connector.class=com.github.mmolimar.kafka.connect.fs.FsSourceConnector - tasks.max=1 - fs.uris=file:///data,hdfs://localhost:9001/data - topic=mytopic - policy.class=com.github.mmolimar.kafka.connect.fs.policy.SimplePolicy - policy.recursive=true - policy_regexp=^[0-9]*\.txt$ - file_reader.class=com.github.mmolimar.kafka.connect.fs.file.reader.TextFileReader -The ``kafka-connect-fs.properties`` file defines: +Some frequently asked questions on Kafka Connect FileSystem Connector can be found here - +http://kafka-connect-fs.readthedocs.io/en/latest/faq.html -1. The connector name. -2. The class containing the connector. -3. The number of tasks the connector is allowed to start. -4. Comma-separated URIs of the FS(s). They can be URIs pointing directly to a file in the FS. -5. Topic in which copy data to. -6. Policy class to apply. -7. Flag to activate traversed recursion in subdirectories when listing files. -8. File reader class to read files from the FS. -9. Regular expression to filter files from the FS. +## Contribute -#### Policies #### - -##### SimplePolicy ##### - -Just list files included in the corresponding URI. - -##### SleepyPolicy ##### - -Simple policy with an custom sleep on each execution. - -``` - policy.sleepy.sleep=200000 - policy.sleepy.fraction=100 - policy.sleepy.max_execs=-1 -``` -1. Max sleep time (in ms) to wait to look for files in the FS. -2. Sleep fraction to divide the sleep time to allow interrupt the policy. -3. Max sleep times allowed (negative to disable). - -##### HdfsFileWatcherPolicy ##### - -It uses Hadoop notifications events (since Hadoop 2.6.0) and all create/append/close events will be reported as new files to be ingested. -Just use it when your URIs start with ``hdfs://`` - -#### File readers #### - -##### AvroFileReader ##### - -Read files with [Avro](http://avro.apache.org/) format. - -##### ParquetFileReader ##### - -Read files with [Parquet](https://parquet.apache.org/) format. - -##### SequenceFileReader ##### - -Read [Sequence files](https://wiki.apache.org/hadoop/SequenceFile). - -##### DelimitedTextFileReader ##### - -Text file reader using custom tokens to distinguish different columns on each line. - -``` - file_reader.delimited.token=, - file_reader.delimited.header=true -``` -1. If the file contains header or not (default false). -2. The token delimiter for columns. - -##### TextFileReader ##### - -Read plain text files. Each line represents one record. - -### Running in development ### -``` -mvn clean package -export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '\-package' | tr '\n' ':')" -$CONFLUENT_HOME/bin/connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties config/kafka-connect-fs.properties -``` - -## TODO's - -- [ ] Add more file readers. -- [ ] Add more policies. -- [ ] Manages FS blocks. -- [ ] Improve documentation. -- [ ] Include a FS Sink Connector. - -## Contributing - -If you would like to add/fix something to this connector, you are welcome to do so! +- Source Code: https://github.com/mmolimar/kafka-connect-fs +- Issue Tracker: https://github.com/mmolimar/kafka-connect-fs/issues ## License Released under the Apache License, version 2.0. -