Multimodal Dataprep Microservice with VDMS

For dataprep microservice, we currently provide one framework: Langchain.

🚀1. Start Microservice with Python (Option 1)

1.1 Install Requirements

  • option 1: Install Single-process version (for 1-10 files processing)

    apt-get update
    apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils
    pip install -r requirements.txt
    

1.2 Start VDMS Server

docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest

1.3 Setup Environment Variables

export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=$(hostname -I | awk '{print $1}')
export VDMS_HOST=${host_ip}
export VDMS_PORT=55555
export INDEX_NAME="rag-vdms"
export your_hf_api_token="{your_hf_token}"
export PYTHONPATH=${path_to_comps}

1.4 Start Data Preparation Microservice for VDMS with Python Script

Start document preparation microservice for VDMS with below command.

python ingest_videos.py

🚀2. Start Microservice with Docker (Option 2)

2.1 Start VDMS Server

docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest

2.1 Setup Environment Variables

export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=$(hostname -I | awk '{print $1}')
export VDMS_HOST=${host_ip}
export VDMS_PORT=55555
export INDEX_NAME="rag-vdms"
export your_hf_api_token="{your_hf_token}"

2.3 Build Docker Image

  • Build docker image

    cd ../../../
    docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile .
    

2.4 Run Docker Compose

docker compose -f comps/dataprep/vdms/multimodal_langchain/docker-compose-dataprep-vdms.yaml up -d

🚀3. Status Microservice

docker container logs -f dataprep-vdms-server

🚀4. Consume Microservice

Once data preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the videos to embedding and save to the database.

Make sure the file path after files=@ is correct.

  • Single file upload

    curl -X POST \
         -H "Content-Type: multipart/form-data" \
         -F "files=@./file1.mp4" \
         http://localhost:6007/v1/dataprep
    
  • Multiple file upload

    curl -X POST \
         -H "Content-Type: multipart/form-data" \
         -F "files=@./file1.mp4" \
         -F "files=@./file2.mp4" \
         -F "files=@./file3.mp4" \
         http://localhost:6007/v1/dataprep
    
  • List of uploaded files

    curl -X GET http://localhost:6007/v1/dataprep/get_videos
    
  • Download uploaded files

    Use the file name from the list

    curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename}