deeplake.core.vectorstore.deep_memory

DeepMemory

class deeplake.core.vectorstore.deep_memory.deep_memory.DeepMemory

__init__(dataset: Dataset, path: Union[str, Path], logger: Logger, embedding_function: Optional[Any] = None, token: Optional[str] = None, creds: Optional[Union[Dict, str]] = None)

Base Deep Memory class to train and evaluate models on DeepMemory managed service.

Parameters

dataset (Dataset) – deeplake dataset object or path.
path (Union[str, pathlib.Path]) – Path to the dataset.
logger (logging.Logger) – Logger object.
embedding_function (Optional[Any], optional) – Embedding funtion class used to convert queries/documents to embeddings. Defaults to None.
token (Optional[str], optional) – API token for the DeepMemory managed service. Defaults to None.
creds (Optional[Dict[str, Any]], optional) – Credentials to access the dataset. Defaults to None.

Raises

ImportError – if indra is not installed

cancel(job_id: str)

Cancel a training job on DeepMemory managed service.

Examples

>>> cancelled: bool = vectorstore.deep_memory.cancel(job_id)

Parameters: job_id (str) – job_id of the training job.
Returns: True if job was cancelled successfully, False otherwise.
Return type: bool

delete(job_id: str)

Delete a training job on DeepMemory managed service.

Examples

>>> deleted: bool = vectorstore.deep_memory.delete(job_id)

Parameters: job_id (str) – job_id of the training job.
Returns: True if job was deleted successfully, False otherwise.
Return type: bool

evaluate(relevance: List[List[Tuple[str, int]]], queries: List[str], embedding_function: Optional[Callable[[...], List[ndarray]]] = None, embedding: Optional[Union[List[ndarray], List[List[float]]]] = None, top_k: List[int] = [1, 3, 5, 10, 50, 100], qvs_params: Optional[Dict[str, Any]] = None) → Dict[str, Dict[str, float]]

Evaluate a model using the DeepMemory managed service.

Examples

>>> # 1. Evaluate a model using an embedding function:
>>> relevance = [[("doc_id_1", 1), ("doc_id_2", 1)], [("doc_id_3", 1)]]
>>> queries = ["What is the capital of India?", "What is the capital of France?"]
>>> embedding_function = openai_embedding.embed_documents
>>> vectorstore.deep_memory.evaluate(
...     relevance=relevance,
...     queries=queries,
...     embedding_function=embedding_function,
... )

>>> # 2. Evaluate a model with precomputed embeddings:
>>> embeddings = [[-1.2, 12, ...], ...]
>>> vectorstore.deep_memory.evaluate(
...     relevance=relevance,
...     queries=queries,
...     embedding=embeddings,
>>> )

>>> # 3. Evaluate a model with precomputed embeddings and log queries:
>>> vectorstore.deep_memory.evaluate(
...     relevance=relevance,
...     queries=queries,
...     embedding=embeddings,
...     qvs_params={"log_queries": True},
... )

>>> # 4. Evaluate with precomputed embeddings, log queries, and a custom branch:
>>> vectorstore.deep_memory.evaluate(
...     relevance=relevance,
...     queries=queries,
...     embedding=embeddings,
...     qvs_params={
...         "log_queries": True,
...         "branch": "queries",
...     }
... )

Parameters

queries (List[str]) – Queries for model evaluation.
relevance (List[List[Tuple[str, int]]]) – Relevant documents and scores for each query. - Outer list: matches the queries. - Inner list: pairs of doc_id and relevance score. - doc_id: Document ID from the corpus dataset, found in the id tensor. - relevance_score: Between 0 (not relevant) and 1 (relevant).
embedding (Optional[np.ndarray], optional) – Query embeddings. Defaults to None.
embedding_function (Optional[Callable[..., List[np.ndarray]]], optional) – Function to convert queries into embeddings. Defaults to None.
top_k (List[int], optional) – Ranks for model evaluation. Defaults to [1, 3, 5, 10, 50, 100].
qvs_params (Optional[Dict], optional) – Parameters to initialize the queries vectorstore. When specified, creates a new vectorstore to track evaluation queries, the Deep Memory response, and the naive vector search results. Defaults to None.

Returns

Recalls for each rank.

Return type

Dict[str, Dict[str, float]]

Raises

ImportError – If indra is not installed.
ValueError – If no embedding_function is provided either during initialization or evaluation.

get_model(): Get the name of the model currently being used by DeepMemory managed service.

list_jobs(debug=False): List all training jobs on DeepMemory managed service.

set_model(model_name: str): Set model.npy to use model_name instead of default model :param model_name: name of the model to use :type model_name: str

status(job_id: str)

Get the status of a training job on DeepMemory managed service.

Examples

>>> vectorstore.deep_memory.status(job_id)
--------------------------------------------------------------
|                  6508464cd80cab681bfcfff3                  |
--------------------------------------------------------------
| status                     | pending                       |
--------------------------------------------------------------
| progress                   | None                          |
--------------------------------------------------------------
| results                    | not available yet             |
--------------------------------------------------------------

Parameters: job_id (str) – job_id of the training job.

train(queries: List[str], relevance: List[List[Tuple[str, int]]], embedding_function: Optional[Callable[[str], ndarray]] = None, token: Optional[str] = None) → str

Train a model on DeepMemory managed service.

Examples

>>> queries: List[str] = ["What is the capital of India?", "What is the capital of France?"]
>>> relevance: List[List[Tuple[str, int]]] = [[("doc_id_1", 1), ("doc_id_2", 1)], [("doc_id_3", 1)]]
>>> # doc_id_1, doc_id_2, doc_id_3 are the ids of the documents in the corpus dataset that is relevant to the queries. It is stored in the `id` tensor of the corpus dataset.
>>> job_id: str = vectorstore.deep_memory.train(queries, relevance)

Parameters

queries (List[str]) – List of queries to train the model on.
relevance (List[List[Tuple[str, int]]]) – List of relevant documents for each query with their respective relevance score. The outer list corresponds to the queries and the inner list corresponds to the doc_id, relevence_score pair for each query. doc_id is the document id in the corpus dataset. It is stored in the id tensor of the corpus dataset. relevence_score is the relevance score of the document for the query. The value is either 0 and 1, where 0 stands for not relevant (unknown relevance) and 1 stands for relevant. Currently, only values of 1 contribute to the training, and there is no reason to provide examples with relevance of 0.
embedding_function (Optional[Callable[[str], np.ndarray]], optional) – Embedding funtion used to convert queries to embeddings. Defaults to None.
token (str, optional) – API token for the DeepMemory managed service. Defaults to None.

Returns

job_id of the training job.

Return type

str

Raises

ValueError – if embedding_function is not specified either during initialization or during training.