# Architecture

The core features of ItemsAPI like search and indexing are written in C++ for maximum speed. The REST API side is written in Node.js (Express.js framework) because of good speed and development productivity.

The C++ side is communicating with Node by using a native binding

# Goal

The goal of ItemsAPI was to make it fast and start easily on one single machine (as the RAM and SSD is getting cheaper). It should work with tens of millions of records and thousands indices on relatively small machine.

# Storage

Internally ItemsAPI uses LMDB database for storing all tokens indexes. LMDB is one of the fastest key value (KV) database which works in the same process as application and save data to file as memory mapped file (MMAP)

ItemsAPI stores all indexes like (1, 2, 3, 5, 10, ...) as a bitmap. It uses Roaring Bitmap internally for a great compression. Thanks to Roaring - intersections or unions of indexes can be hundreds times faster than traditional arrays of integers. Intersections and unions are the core operations for each search engine. They are used hundreds of times for each search request.

# Multi-tenancy

ItemsAPI support multi-tenancy. It means even thousands of indices from one single instance. Each index works as separate LMDB database. Search performance is scaling linearly here and there can be only one writer per instance.

# JSON Parsing

ItemsAPI uses simdjson for parsing JSON data. It's written in C++ and is a few times faster than competetive solutions.
It's particulary helpful for indexing large JSON files or generally indexing very large JSON file in batch

# Tokenization

ItemsAPI tokenizes all provided texts by all those non-ascii characters !\"#$%&'()*+,-./:;<=>?@\[\\]^_``{|}~\n\v\f\r. For example for a given string domain.com http://www.domain2.com it tokenizes it for an array ['domain', 'com', 'http', 'www', 'domain2', 'com']

← Installation Other →