Text to knowledge graph (text2kg) microservice

Text to Knowledge Graph (text2kg) Microservice enables the conversion of unstructured text into structured data by generating graph triplets. This process, which can be complex, has become more accessible with the rise of Large Language Models (LLMs), making it a mainstream solution for data extraction tasks. We are using a decoder-only model for this application’s purpose. This microservice can be run on cpu or hpu and instructions for the same are mentioned below.

Decoder-Only Models

Decoder-only models are optimized for fast inference by skipping the encoding step. They work well for tasks where input-output mappings are relatively simple, or when multitasking is required. These models are ideal when computational efficiency and prompt-based output generation are priorities. However, decoder-only models may struggle with tasks that require deep contextual understanding or when input-output structures are highly complex or varied.

Features

Input Formats: Accepts text from documents, text files, or strings*.

Output: Answer to the query asked by the user.

🚀 1. Start individual microservices using docker cli (Option 1)

Update the environment_setup.sh file with your device and user information, and source it using -

source comps/text2kg/src/environment_setup.sh

If you skip this step, you can export variables related to individual services as mentioned in each of the microservices.

1. TGI

Refer to this link to start and verify the TGI microservice.

2. Neo4J

Refer to this link to start and verify the neo4j microservice.

export DATA_DIRECTORY=$(pwd)
export ENTITIES="PERSON,PLACE,ORGANIZATION"
export RELATIONS="HAS,PART_OF,WORKED_ON,WORKED_WITH,WORKED_AT"
export VALIDATION_SCHEMA='{
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"]
}'

3. Text2kg

cd comps/text2kg/src/
export TEXT2KG_PORT=8090

Build the text2kg docker image

docker build -f Dockerfile -t opea/text2kg:latest ../../../

Launch the docker container

docker run -i -t --net=host --ipc=host -p TEXT2KG_PORT -e HF_TOKEN=${HF_TOKEN} -e LLM_MODEL_ID=${LLM_MODEL_ID} opea/text2kg:latest -v data:/home/user/comps/text2kg/src/data /bin/bash

🚀 2. Start text2kg and dependent microservices with docker-compose (Option 2)

cd comps/text2kg/deployment/docker_compose/

Export service name and log path

export service_name="text2kg"
export LOG_PATH=$PWD

Export NEO4J variables - refer to section 1.2.b. Launch using the following command to run on cpu

docker compose -f compose.yaml -f custom-override.yml up ${service_name}  -d > ${LOG_PATH}/start_services_with_compose.log

Launch using the following command to run on gaudi

docker compose -f compose.yaml up ${service_name}  -d > ${LOG_PATH}/start_services_with_compose.log

3. Check the service using API endpoint

curl -X 'POST' \
  'http://localhost:TEXT2KG_PORT/v1/text2kg?input_text=Who%20is%20paul%20graham%3F' \
  -H 'accept: application/json' \
  -d ''
  • Make sure your input document/string has the necessary information that can be extracted.