MariaDB is one of the most popular open-source relational database servers, and with its built-in vector support it is now a powerful platform for AI-driven applications. MariaDB Vector enables storage and similarity search of high-dimensional vector embeddings directly within the database, making it an excellent choice for Retrieval-Augmented Generation (RAG) pipelines and semantic search.
GlobalSolutions has deep expertise in building end-to-end RAG pipelines and vector content ingestion workflows using MariaDB Vector DB. We can help you design, build, and deploy production-ready pipelines that embed, store, and retrieve content at scale — enabling your AI applications to ground responses in your own data.
MariaDB is fast and scalable with a rich ecosystem of storage engines and plugins, providing a full SQL interface for accessing both relational and vector data side by side.
yourpemfile.pem and <public-ip> with your values:ssh -i yourpemfile.pem ubuntu@<public-ip-of-your-instance>
ubuntu user.For more information on connecting to EC2 instances please refer to: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-linux-inst-ssh.html
| Category | Package | Version | Location (Ubuntu) |
|---|---|---|---|
| Database | MariaDB Server | 11.x | /usr/sbin/mysqld | Data: /var/lib/mysql |
| Config | MariaDB Config | — | /etc/mysql/mariadb.conf.d/ |
| Service | systemd service | — | systemctl status mariadb |
| Client | MariaDB Client | — | /usr/bin/mysql |
| Username | Password |
|---|---|
| root | Instance ID of your EC2 |
Where to find your Instance ID — You can find the instance ID in your AWS console. When you select your instance it shows in the bottom half of the page with all the instance information. Alternatively, once logged in via SSH you can run:
curl -s http://169.254.169.254/latest/meta-data/instance-id
The Maria VectorDB offering of GlobalSolutions comes prepackaged and ready to use. Once you have SSH'd into the instance, connect to MariaDB using the following command:
mysql -u root -p
When prompted for the password, enter your EC2 Instance ID.
Once connected, create a dedicated database for your vector data:
CREATE DATABASE vectordb;
USE vectordb;
Create a table with a vector column to store your embeddings. The example below uses 1536 dimensions, suitable for OpenAI text-embedding-ada-002 or similar models:
CREATE TABLE embeddings (
id INT AUTO_INCREMENT PRIMARY KEY,
content TEXT NOT NULL,
source VARCHAR(255),
created_at DATETIME DEFAULT NOW(),
embedding VECTOR(1536) NOT NULL,
VECTOR INDEX (embedding)
);
To insert a document along with its vector embedding (generated by your pipeline), use:
INSERT INTO embeddings (content, source, embedding)
VALUES (
'Your document text goes here',
'source-identifier',
VEC_FromText('[0.012, -0.045, 0.331, ...]')
);
To retrieve the most semantically similar documents to a query embedding:
SELECT id, content, source,
VEC_DISTANCE(embedding, VEC_FromText('[0.012, -0.045, ...]')) AS distance
FROM embeddings
ORDER BY distance ASC
LIMIT 5;
To help you get started quickly, GlobalSolutions has pre-built a sample vector dataset on this instance. The example below demonstrates a complete end-to-end workflow — creating a database, inserting product descriptions with their vector embeddings, and performing a semantic similarity search. Study this example to understand how to structure your own RAG pipelines and vector ingestion workflows.
Create and select a dedicated database for your AI workloads:
CREATE DATABASE IF NOT EXISTS global_ai_db;
USE global_ai_db;
Define a table with a VECTOR(3) column and a vector index. In production, replace
3 with the actual dimensionality of your embedding model (e.g. 1536 for OpenAI
text-embedding-ada-002, 768 for nomic-embed-text).
CREATE TABLE IF NOT EXISTS ai_product_catalog (
id INT AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(100),
description TEXT,
embedding VECTOR(3) NOT NULL,
VECTOR INDEX (embedding)
);
Each record pairs a piece of text content with its vector embedding using VEC_FromText().
In a real pipeline these embeddings would be generated by your embedding model (e.g. Ollama, OpenAI)
before being inserted here.
INSERT INTO ai_product_catalog (product_name, description, embedding) VALUES
('Waterproof Hiking Boots',
'Durable, weather-resistant boots for mountain trails.',
VEC_FromText('[0.12, 0.85, -0.44]')),
('Smart Fitness Watch',
'Tracks steps, heart rate, and sleep metrics with GPS.',
VEC_FromText('[0.91, -0.11, 0.32]')),
('Trail Running Shoes',
'Lightweight athletic sneakers built for rugged off-road tracks.',
VEC_FromText('[0.15, 0.79, -0.41]'));
This is the core of a RAG retrieval step. Provide a query vector (generated from the user's question by your embedding model) and MariaDB returns the closest matching records ranked by Euclidean distance. The lower the distance, the more semantically similar the result.
SELECT
product_name,
description,
VEC_DISTANCE_EUCLIDEAN(embedding, VEC_FromText('[0.14, 0.82, -0.43]')) AS distance
FROM ai_product_catalog
ORDER BY distance ASC
LIMIT 2;
Running the query above will return the two most semantically similar products to the query vector
[0.14, 0.82, -0.43]. In this example, Waterproof Hiking Boots and
Trail Running Shoes will rank closest because their embeddings are geometrically
near the query — reflecting that they are both outdoor footwear, similar to what the query vector
represents. Smart Fitness Watch will rank further away as it belongs to a different
semantic category.
VEC_DISTANCE_EUCLIDEAN call, and feed the top-ranked description
values into your LLM as context. GlobalSolutions can help you build this full pipeline end-to-end.
Reach out at support@theglobalsolutions.net.
Our other popular offering is the AWS Cost Optimizer aka CloudInsider, available in AWS Marketplace. This service has helped our customers save significantly on AWS and other cloud spending. It is easy to subscribe and you can see the savings in minutes.
▶ Watch Demo Video Subscribe on AWS MarketplacePlease contact us at support@theglobalsolutions.net for any questions on this offering in AWS Marketplace.