Components are the classes that do stuff within a stream. Components are assembled into pipelines and executed using a runtime. There are several core types of Components, each using a specific java interface:
A Processor is a component that processes data flowing through the stream - transformations, filters, and enrichments are common processors.
A Schema defines the expected shape of the documents that will passed from step to step within a stream. Defining the schema for a type of document allows source files and resource files to be generated by the build process, relieving your team of the need to maintain these files by hand.
Schemas can include other schemas, whether in the same repo or available via HTTP, allowing for full or partial reuse within or across organizations.
A Datum is a single piece of data within a stream. A datum typically has an identifier, a timestamp, a document (which may be any java object), and additional metadata kept apart from the document related to upstream or downstream processing..
Apache Streams has a preference for ActivityStreams formatted messages. These messages may be passed using the ‘Activity’ class or one of it’s sub-classes.
An activity has several sub-object fields:
Streams containing details of actors, objects, etc… may be created using the ‘ActivityObject’ class or one of it’s sub-classes.
A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together.
A Runtime is a module containing bindings that help setup and run a pipeline. Runtimes may submit pipeline binaries to an existing cluster, or may launch the process(es) to execute the stream directly.