deeplake.core.vectorstore.deep_memory
DeepMemory
- class deeplake.core.vectorstore.deep_memory.deep_memory.DeepMemory
- __init__(dataset: Dataset, path: Union[str, Path], logger: Logger, embedding_function: Optional[Any] = None, token: Optional[str] = None, creds: Optional[Union[Dict, str]] = None)
Base Deep Memory class to train and evaluate models on DeepMemory managed service.
- Parameters
dataset (Dataset) – deeplake dataset object or path.
path (Union[str, pathlib.Path]) – Path to the dataset.
logger (logging.Logger) – Logger object.
embedding_function (Optional[Any], optional) – Embedding funtion class used to convert queries/documents to embeddings. Defaults to None.
token (Optional[str], optional) – API token for the DeepMemory managed service. Defaults to None.
creds (Optional[Dict[str, Any]], optional) – Credentials to access the dataset. Defaults to None.
- Raises
ImportError – if indra is not installed
- cancel(job_id: str)
Cancel a training job on DeepMemory managed service.
Examples
>>> cancelled: bool = vectorstore.deep_memory.cancel(job_id)
- Parameters
job_id (str) – job_id of the training job.
- Returns
True if job was cancelled successfully, False otherwise.
- Return type
bool
- delete(job_id: str)
Delete a training job on DeepMemory managed service.
Examples
>>> deleted: bool = vectorstore.deep_memory.delete(job_id)
- Parameters
job_id (str) – job_id of the training job.
- Returns
True if job was deleted successfully, False otherwise.
- Return type
bool
- evaluate(relevance: List[List[Tuple[str, int]]], queries: List[str], embedding_function: Optional[Callable[[...], List[ndarray]]] = None, embedding: Optional[Union[List[ndarray], List[List[float]]]] = None, top_k: List[int] = [1, 3, 5, 10, 50, 100], qvs_params: Optional[Dict[str, Any]] = None) Dict[str, Dict[str, float]]
Evaluate a model using the DeepMemory managed service.
Examples
>>> # 1. Evaluate a model using an embedding function: >>> relevance = [[("doc_id_1", 1), ("doc_id_2", 1)], [("doc_id_3", 1)]] >>> queries = ["What is the capital of India?", "What is the capital of France?"] >>> embedding_function = openai_embedding.embed_documents >>> vectorstore.deep_memory.evaluate( ... relevance=relevance, ... queries=queries, ... embedding_function=embedding_function, ... )
>>> # 2. Evaluate a model with precomputed embeddings: >>> embeddings = [[-1.2, 12, ...], ...] >>> vectorstore.deep_memory.evaluate( ... relevance=relevance, ... queries=queries, ... embedding=embeddings, >>> )
>>> # 3. Evaluate a model with precomputed embeddings and log queries: >>> vectorstore.deep_memory.evaluate( ... relevance=relevance, ... queries=queries, ... embedding=embeddings, ... qvs_params={"log_queries": True}, ... )
>>> # 4. Evaluate with precomputed embeddings, log queries, and a custom branch: >>> vectorstore.deep_memory.evaluate( ... relevance=relevance, ... queries=queries, ... embedding=embeddings, ... qvs_params={ ... "log_queries": True, ... "branch": "queries", ... } ... )
- Parameters
queries (List[str]) – Queries for model evaluation.
relevance (List[List[Tuple[str, int]]]) – Relevant documents and scores for each query. - Outer list: matches the queries. - Inner list: pairs of doc_id and relevance score. - doc_id: Document ID from the corpus dataset, found in the id tensor. - relevance_score: Between 0 (not relevant) and 1 (relevant).
embedding (Optional[np.ndarray], optional) – Query embeddings. Defaults to None.
embedding_function (Optional[Callable[..., List[np.ndarray]]], optional) – Function to convert queries into embeddings. Defaults to None.
top_k (List[int], optional) – Ranks for model evaluation. Defaults to [1, 3, 5, 10, 50, 100].
qvs_params (Optional[Dict], optional) – Parameters to initialize the queries vectorstore. When specified, creates a new vectorstore to track evaluation queries, the Deep Memory response, and the naive vector search results. Defaults to None.
- Returns
Recalls for each rank.
- Return type
Dict[str, Dict[str, float]]
- Raises
ImportError – If indra is not installed.
ValueError – If no embedding_function is provided either during initialization or evaluation.
- get_model()
Get the name of the model currently being used by DeepMemory managed service.
- list_jobs(debug=False)
List all training jobs on DeepMemory managed service.
- set_model(model_name: str)
Set model.npy to use model_name instead of default model :param model_name: name of the model to use :type model_name: str
- status(job_id: str)
Get the status of a training job on DeepMemory managed service.
Examples
>>> vectorstore.deep_memory.status(job_id) -------------------------------------------------------------- | 6508464cd80cab681bfcfff3 | -------------------------------------------------------------- | status | pending | -------------------------------------------------------------- | progress | None | -------------------------------------------------------------- | results | not available yet | --------------------------------------------------------------
- Parameters
job_id (str) – job_id of the training job.
- train(queries: List[str], relevance: List[List[Tuple[str, int]]], embedding_function: Optional[Callable[[str], ndarray]] = None, token: Optional[str] = None) str
Train a model on DeepMemory managed service.
Examples
>>> queries: List[str] = ["What is the capital of India?", "What is the capital of France?"] >>> relevance: List[List[Tuple[str, int]]] = [[("doc_id_1", 1), ("doc_id_2", 1)], [("doc_id_3", 1)]] >>> # doc_id_1, doc_id_2, doc_id_3 are the ids of the documents in the corpus dataset that is relevant to the queries. It is stored in the `id` tensor of the corpus dataset. >>> job_id: str = vectorstore.deep_memory.train(queries, relevance)
- Parameters
queries (List[str]) – List of queries to train the model on.
relevance (List[List[Tuple[str, int]]]) – List of relevant documents for each query with their respective relevance score. The outer list corresponds to the queries and the inner list corresponds to the doc_id, relevence_score pair for each query. doc_id is the document id in the corpus dataset. It is stored in the id tensor of the corpus dataset. relevence_score is the relevance score of the document for the query. The value is either 0 and 1, where 0 stands for not relevant (unknown relevance) and 1 stands for relevant. Currently, only values of 1 contribute to the training, and there is no reason to provide examples with relevance of 0.
embedding_function (Optional[Callable[[str], np.ndarray]], optional) – Embedding funtion used to convert queries to embeddings. Defaults to None.
token (str, optional) – API token for the DeepMemory managed service. Defaults to None.
- Returns
job_id of the training job.
- Return type
str
- Raises
ValueError – if embedding_function is not specified either during initialization or during training.