Maria VectorDB with Sample Data Powered by GlobalSolutions

MariaDB is one of the most popular open-source relational database servers, and with its built-in vector support it is now a powerful platform for AI-driven applications. MariaDB Vector enables storage and similarity search of high-dimensional vector embeddings directly within the database, making it an excellent choice for Retrieval-Augmented Generation (RAG) pipelines and semantic search.

GlobalSolutions has deep expertise in building end-to-end RAG pipelines and vector content ingestion workflows using MariaDB Vector DB. We can help you design, build, and deploy production-ready pipelines that embed, store, and retrieve content at scale — enabling your AI applications to ground responses in your own data.

MariaDB is fast and scalable with a rich ecosystem of storage engines and plugins, providing a full SQL interface for accessing both relational and vector data side by side.

Note: We have ensured the image is hardened to be secured from all existing vulnerabilities.

Why Subscribe to Our Offering in AWS Marketplace

We update the software constantly to the latest version to address security issues.
Customers can kick-start their core work right away with our pre-packaged AMIs.
Production-ready application stacks optimised for vector workloads.
GlobalSolutions expertise available to help build your RAG pipeline from day one.

How to Access Our AMIs from AWS Marketplace

Subscribe: Purchase the Maria VectorDB AMI directly from AWS Marketplace.
Connect via SSH:
- Go to the AWS Console, select your instance, and note the public IP address.
- Make sure port 22 is open in your instance's Security Group.
- Connect using the following command, replacing yourpemfile.pem and <public-ip> with your values:
```
ssh -i yourpemfile.pem ubuntu@<public-ip-of-your-instance>
```
- Once logged in you will land in the home directory of the ubuntu user.

For more information on connecting to EC2 instances please refer to: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-linux-inst-ssh.html

Installation Location

Category	Package	Version	Location (Ubuntu)
Database	MariaDB Server	11.x	`/usr/sbin/mysqld` \| Data: `/var/lib/mysql`
Config	MariaDB Config	—	`/etc/mysql/mariadb.conf.d/`
Service	systemd service	—	`systemctl status mariadb`
Client	MariaDB Client	—	`/usr/bin/mysql`

MySQL and MariaDB Login

Username	Password
root	Instance ID of your EC2

Where to find your Instance ID — You can find the instance ID in your AWS console. When you select your instance it shows in the bottom half of the page with all the instance information. Alternatively, once logged in via SSH you can run:

curl -s http://169.254.169.254/latest/meta-data/instance-id

Getting Started

The Maria VectorDB offering of GlobalSolutions comes prepackaged and ready to use. Once you have SSH'd into the instance, connect to MariaDB using the following command:

mysql -u root -p

When prompted for the password, enter your EC2 Instance ID.

Creating Vector Databases and Tables

Once connected, create a dedicated database for your vector data:

CREATE DATABASE vectordb;
USE vectordb;

Create a table with a vector column to store your embeddings. The example below uses 1536 dimensions, suitable for OpenAI text-embedding-ada-002 or similar models:

CREATE TABLE embeddings (
  id          INT AUTO_INCREMENT PRIMARY KEY,
  content     TEXT NOT NULL,
  source      VARCHAR(255),
  created_at  DATETIME DEFAULT NOW(),
  embedding   VECTOR(1536) NOT NULL,
  VECTOR INDEX (embedding)
);

Injecting Embedded Content

To insert a document along with its vector embedding (generated by your pipeline), use:

INSERT INTO embeddings (content, source, embedding)
VALUES (
  'Your document text goes here',
  'source-identifier',
  VEC_FromText('[0.012, -0.045, 0.331, ...]')
);

Querying — Similarity Search (RAG Retrieval)

To retrieve the most semantically similar documents to a query embedding:

SELECT id, content, source,
       VEC_DISTANCE(embedding, VEC_FromText('[0.012, -0.045, ...]')) AS distance
FROM   embeddings
ORDER  BY distance ASC
LIMIT  5;

Tip: GlobalSolutions can help you build the full pipeline — from chunking and embedding your content (using OpenAI, Ollama, or other models) to storing vectors in MariaDB and wiring up retrieval into your RAG application. Contact us at support@theglobalsolutions.net to get started.

Sample Vector Data — Learn by Example

To help you get started quickly, GlobalSolutions has pre-built a sample vector dataset on this instance. The example below demonstrates a complete end-to-end workflow — creating a database, inserting product descriptions with their vector embeddings, and performing a semantic similarity search. Study this example to understand how to structure your own RAG pipelines and vector ingestion workflows.

Step 1 — Set Up the Database

Create and select a dedicated database for your AI workloads:

CREATE DATABASE IF NOT EXISTS global_ai_db;
USE global_ai_db;

Step 2 — Create the Table with Vector Support

Define a table with a VECTOR(3) column and a vector index. In production, replace 3 with the actual dimensionality of your embedding model (e.g. 1536 for OpenAI text-embedding-ada-002, 768 for nomic-embed-text).

CREATE TABLE IF NOT EXISTS ai_product_catalog (
    id           INT AUTO_INCREMENT PRIMARY KEY,
    product_name VARCHAR(100),
    description  TEXT,
    embedding    VECTOR(3) NOT NULL,
    VECTOR INDEX (embedding)
);

Step 3 — Insert Sample Records

Each record pairs a piece of text content with its vector embedding using VEC_FromText(). In a real pipeline these embeddings would be generated by your embedding model (e.g. Ollama, OpenAI) before being inserted here.

INSERT INTO ai_product_catalog (product_name, description, embedding) VALUES
('Waterproof Hiking Boots',
 'Durable, weather-resistant boots for mountain trails.',
 VEC_FromText('[0.12, 0.85, -0.44]')),

('Smart Fitness Watch',
 'Tracks steps, heart rate, and sleep metrics with GPS.',
 VEC_FromText('[0.91, -0.11, 0.32]')),

('Trail Running Shoes',
 'Lightweight athletic sneakers built for rugged off-road tracks.',
 VEC_FromText('[0.15, 0.79, -0.41]'));

Step 4 — Semantic Similarity Search

This is the core of a RAG retrieval step. Provide a query vector (generated from the user's question by your embedding model) and MariaDB returns the closest matching records ranked by Euclidean distance. The lower the distance, the more semantically similar the result.

SELECT
    product_name,
    description,
    VEC_DISTANCE_EUCLIDEAN(embedding, VEC_FromText('[0.14, 0.82, -0.43]')) AS distance
FROM ai_product_catalog
ORDER BY distance ASC
LIMIT 2;

Understanding the Results

Running the query above will return the two most semantically similar products to the query vector [0.14, 0.82, -0.43]. In this example, Waterproof Hiking Boots and Trail Running Shoes will rank closest because their embeddings are geometrically near the query — reflecting that they are both outdoor footwear, similar to what the query vector represents. Smart Fitness Watch will rank further away as it belongs to a different semantic category.

Key Takeaway: In a production RAG pipeline you would replace the hardcoded vectors with embeddings generated dynamically — embed the user's query at runtime, pass the resulting vector into the VEC_DISTANCE_EUCLIDEAN call, and feed the top-ranked description values into your LLM as context. GlobalSolutions can help you build this full pipeline end-to-end. Reach out at support@theglobalsolutions.net.

AWS Cost Optimizer — CloudInsider

Support

Please contact us at support@theglobalsolutions.net for any questions on this offering in AWS Marketplace.