Dataprep Microservice with MariaDB Vector¶
🚀1. Start Microservice with Docker¶
1.1 Build Docker Image¶
cd GenAIComps
docker build -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
1.2 Run Docker with CLI (Option A)¶
1.2.1 Start MariaDB Server¶
Please refer to this readme.
1.2.2 Start the data preparation service¶
export HOST_IP=$(hostname -I | awk '{print $1}')
# If you've configured the server with the default env values then:
export MARIADB_CONNECTION_URL: mariadb+mariadbconnector://dbuser:password@${HOST_IP}$:3306/vectordb
docker run -d --rm --name="dataprep-mariadb-vector" -p 5000:5000 --ipc=host -e MARIADB_CONNECTION_URL=$MARIADB_CONNECTION_URL -e DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_MARIADBVECTOR" opea/dataprep:latest
1.3 Run with Docker Compose (Option B)¶
cd comps/dataprep/deployment/docker_compose
docker compose -f compose.yaml up dataprep-mariadb-vector -d
🚀2. Consume Microservice¶
2.1 Consume Upload API¶
Once the data preparation microservice for MariaDB Vector is started, one can use the below command to invoke the microservice to convert documents/links to embeddings and save them to the vector store.
export document="/path/to/document"
curl -X POST \
-H "Content-Type: application/json" \
-d '{"path":"${document}"}' \
http://localhost:6007/v1/dataprep/ingest
2.2 Consume get API¶
To get the structure of the uploaded files, use the get
API endpoint:
curl -X POST \
-H "Content-Type: application/json" \
http://localhost:6007/v1/dataprep/get
A JSON formatted response similar to the one below will follow:
[
{
"name": "uploaded_file_1.txt",
"id": "uploaded_file_1.txt",
"type": "File",
"parent": ""
},
{
"name": "uploaded_file_2.txt",
"id": "uploaded_file_2.txt",
"type": "File",
"parent": ""
}
]
2.3 Consume delete API¶
To delete uploaded files/links, use the delete
API endpoint.
The file_path
is the id
returned by the /v1/dataprep/get
API.
# delete link
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "https://www.ces.tech/.txt"}'
# delete file
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "uploaded_file_1.txt"}'
# delete all files and links
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "all"}'