Tech Stack for Horizontal Components
The ingestion/transformation modules are based on kafka
+ flink/spark(with flare) and can support any complex ETL requirement.
The ingestion module scales to data rates of millions of data records per second.
For low throughput but high complexity batch jobs, we also support Talend.
The data store can either be a pure RDBMS (Postgres with XL extensions) or a Hbase based tabular store with SQL support (Hawq or Trafodion). Other stores when stable, like kudu
will be added as needed in future. We believe a SQL interface is key to any operational data store.
For log writes, time series type data Cassandra/ScyllaDB
is used.
The idea is to map data to stores that do not result in a semantic mismatch.