# Text to knowledge graph (text2kg) microservice

Text to Knowledge Graph (text2kg) Microservice enables the conversion of unstructured text into structured data by generating graph triplets. This process, which can be complex, has become more accessible with the rise of Large Language Models (LLMs), making it a mainstream solution for data extraction tasks. We are using a decoder-only model for this application's purpose.
This microservice can be run on cpu or hpu and instructions for the same are mentioned below.

## Decoder-Only Models

Decoder-only models are optimized for fast inference by skipping the encoding step. They work well for tasks where input-output mappings are relatively simple, or when multitasking is required. These models are ideal when computational efficiency and prompt-based output generation are priorities. However, decoder-only models may struggle with tasks that require deep contextual understanding or when input-output structures are highly complex or varied.

## Features

Input Formats: Accepts text from documents, text files, or strings\*.

Output: Answer to the query asked by the user.

## 🚀 1. Start individual microservices using docker cli (Option 1)

Update the environment_setup.sh file with your device and user information, and source it using -

```bash
source comps/text2kg/src/environment_setup.sh
```

If you skip this step, you can export variables related to individual services as mentioned in each of the microservices.

### 1. TGI

Refer to [this link](/GenAIComps/comps/third_parties/tgi/README.md) to start and verify the TGI microservice.

### 2. Neo4J

Refer to [this link](/GenAIComps/comps/third_parties/neo4j/src/README.md) to start and verify the neo4j microservice.

```bash
export DATA_DIRECTORY=$(pwd)
export ENTITIES="PERSON,PLACE,ORGANIZATION"
export RELATIONS="HAS,PART_OF,WORKED_ON,WORKED_WITH,WORKED_AT"
export VALIDATION_SCHEMA='{
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"]
}'
```

### 3. Text2kg

```bash
cd comps/text2kg/src/
export TEXT2KG_PORT=8090
```

Build the text2kg docker image

```bash
docker build -f Dockerfile -t opea/text2kg:latest ../../../
```

Launch the docker container

```bash
docker run -i -t --net=host --ipc=host -p TEXT2KG_PORT -e HF_TOKEN=${HF_TOKEN} -e LLM_MODEL_ID=${LLM_MODEL_ID} opea/text2kg:latest -v data:/home/user/comps/text2kg/src/data /bin/bash
```

## 🚀 2. Start text2kg and dependent microservices with docker-compose (Option 2)

```bash
cd comps/text2kg/deployment/docker_compose/
```

Export service name and log path

```bash
export service_name="text2kg"
export LOG_PATH=$PWD
```

Export NEO4J variables - refer to section 1.2.b.
Launch using the following command to run on cpu

```bash
docker compose -f compose.yaml -f custom-override.yml up ${service_name}  -d > ${LOG_PATH}/start_services_with_compose.log
```

Launch using the following command to run on gaudi

```bash
docker compose -f compose.yaml up ${service_name}  -d > ${LOG_PATH}/start_services_with_compose.log
```

## 3. Check the service using API endpoint

```bash
curl -X 'POST' \
  'http://localhost:TEXT2KG_PORT/v1/text2kg?input_text=Who%20is%20paul%20graham%3F' \
  -H 'accept: application/json' \
  -d ''
```

- Make sure your input document/string has the necessary information that can be extracted.