Multimodal Dataprep Microservice with VDMS¶
For dataprep microservice, we currently provide one framework: Langchain
.
🚀1. Start Microservice with Python (Option 1)¶
1.1 Install Requirements¶
option 1: Install Single-process version (for 1-10 files processing)
apt-get update apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils pip install -r requirements.txt
1.2 Start VDMS Server¶
docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest
1.3 Setup Environment Variables¶
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=$(hostname -I | awk '{print $1}')
export VDMS_HOST=${host_ip}
export VDMS_PORT=55555
export INDEX_NAME="rag-vdms"
export your_hf_api_token="{your_hf_token}"
export PYTHONPATH=${path_to_comps}
1.4 Start Data Preparation Microservice for VDMS with Python Script¶
Start document preparation microservice for VDMS with below command.
python ingest_videos.py
🚀2. Start Microservice with Docker (Option 2)¶
2.1 Start VDMS Server¶
docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest
2.1 Setup Environment Variables¶
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=$(hostname -I | awk '{print $1}')
export VDMS_HOST=${host_ip}
export VDMS_PORT=55555
export INDEX_NAME="rag-vdms"
export your_hf_api_token="{your_hf_token}"
2.3 Build Docker Image¶
Build docker image
cd ../../../ docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile .
2.4 Run Docker Compose¶
docker compose -f comps/dataprep/vdms/multimodal_langchain/docker-compose-dataprep-vdms.yaml up -d
🚀3. Status Microservice¶
docker container logs -f dataprep-vdms-server
🚀4. Consume Microservice¶
Once data preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the videos to embedding and save to the database.
Make sure the file path after files=@
is correct.
Single file upload
curl -X POST \ -H "Content-Type: multipart/form-data" \ -F "files=@./file1.mp4" \ http://localhost:6007/v1/dataprep
Multiple file upload
curl -X POST \ -H "Content-Type: multipart/form-data" \ -F "files=@./file1.mp4" \ -F "files=@./file2.mp4" \ -F "files=@./file3.mp4" \ http://localhost:6007/v1/dataprep
List of uploaded files
curl -X GET http://localhost:6007/v1/dataprep/get_videos
Download uploaded files
Use the file name from the list
curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename}