# Architecture
The core features of ItemsAPI like search and indexing are written in C++ for maximum speed. The REST API side is written in Node.js (Express.js framework) because of good speed and development productivity.
The C++ side is communicating with Node by using a native binding
# Goal
The goal of ItemsAPI was to make it fast and start easily on one single machine (as the RAM and SSD is getting cheaper). It should work with tens of millions of records and thousands indices on relatively small machine.
# Storage
Internally ItemsAPI uses LMDB database for storing all tokens indexes. LMDB is one of the fastest key value (KV) database which works in the same process as application and save data to file as memory mapped file (MMAP)
ItemsAPI stores all indexes like (1, 2, 3, 5, 10, ...) as a bitmap. It uses Roaring Bitmap internally for a great compression. Thanks to Roaring - intersections or unions of indexes can be hundreds times faster than traditional arrays of integers. Intersections and unions are the core operations for each search engine. They are used hundreds of times for each search request.
# Multi-tenancy
ItemsAPI support multi-tenancy. It means even thousands of indices from one single instance. Each index works as separate LMDB database. Search performance is scaling linearly here and there can be only one writer per instance.
# JSON Parsing
ItemsAPI uses simdjson for parsing JSON data. It's written in C++ and is a few times faster than competetive solutions.
It's particulary helpful for indexing large JSON files or generally indexing very large JSON file in batch
# Tokenization
ItemsAPI tokenizes all provided texts by all those non-ascii characters !\"#$%&'()*+,-./:;<=>?@\[\\]^_``{|}~\n\v\f\r
. For example
for a given string domain.com http://www.domain2.com
it tokenizes it for an array ['domain', 'com', 'http', 'www', 'domain2', 'com']
← Installation Other →