vcache-project · kyle65463 · Jun 15, 2025 · Jun 15, 2025
diff --git a/README.md b/README.md
@@ -8,25 +8,16 @@
   </picture>
 </p>
 
-
 <h3 align="center">
 Reliable and Efficient Semantic Prompt Caching
 </h3>
 <br>
 
-
-
-
 Semantic caching reduces LLM latency and cost by returning cached model responses for semantically similar prompts (not just exact matches). **vCache** is the first verified semantic cache that **guarantees user-defined error rate bounds**. vCache replaces static thresholds with **online-learned, embedding-specific decision boundaries**—no manual fine-tuning required. This enables reliable cached response reuse across any embedding model or workload.
 
-
-
 > [NOTE]
 > vCache is currently in active development. Features and APIs may change as we continue to improve the system.
 
-
-
-
 ## 🚀 Quick Install
 
 Install vCache in editable mode:
@@ -40,6 +31,7 @@ Then, set your OpenAI key:
 ```bash
 export OPENAI_API_KEY="your_api_key_here"
 ```
+
 (Note: vCache uses OpenAI by default for both LLM inference and embedding generation, but you can configure any other backend)
 
 Finally, use vCache in your Python code:
@@ -53,16 +45,14 @@ print(f"Response: {response}")
 ```
 
 By default, vCache uses:
+
 - `OpenAIInferenceEngine`
 - `OpenAIEmbeddingEngine`
 - `HNSWLibVectorDB`
-- `InMemoryEmbeddingMetadataStorage`
 - `NoEvictionPolicy`
 - `StringComparisonSimilarityEvaluator`
 - `VerifiedDecisionPolicy` with a maximum failure rate of 2%
 
-
-
 ## ⚙️ Advanced Configuration
 
 vCache is modular and highly configurable. Below is an example showing how to customize key components:
@@ -75,12 +65,12 @@ from vcache.main import VCache
 from vcache.config import VCacheConfig
 from vcache.inference_engine.strategies.open_ai import OpenAIInferenceEngine
 from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import OpenAIEmbeddingEngine
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import InMemoryEmbeddingMetadataStorage
 from vcache.vcache_core.similarity_evaluator.strategies.string_comparison import StringComparisonSimilarityEvaluator
 from vcache.vcache_policy.strategies.dynamic_local_threshold import VerifiedDecisionPolicy
 from vcache.vcache_policy.vcache_policy import VCachePolicy
-from vcache.vcache_core.cache.embedding_store.vector_db import HNSWLibVectorDB, SimilarityMetricType
+from vcache.vcache_core.cache.vector_db import HNSWLibVectorDB, SimilarityMetricType
 ```
+
 </details>
 
 ```python
@@ -92,7 +82,6 @@ vcache_config: VCacheConfig = VCacheConfig(
         similarity_metric_type=SimilarityMetricType.COSINE,
         max_capacity=100_000,
     ),
-    embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
     similarity_evaluator=StringComparisonSimilarityEvaluator,
 )
 
@@ -117,16 +106,16 @@ Semantic caching reduces LLM latency and cost by returning cached model response
 ### Architecture Overview
 
 1. **Embed & Store**  
-Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response.
+   Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response.
 
 2. **Nearest-Neighbor Lookup**  
-When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity).
+   When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity).
 
 3. **Similarity Score**  
-The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry.
+   The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry.
 
-4. **Decision: Exploit vs. Explore**  
-   - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response.  
+4. **Decision: Exploit vs. Explore**
+   - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response.
    - **Explore (cache miss):** Otherwise, infer the LLM for a response, add its embedding and answer to the cache, and return it.
 
 <p align="left">
@@ -139,6 +128,7 @@ The system computes a score between 0 and 1 that quantifies how “close” the
 </p>
 
 ### Why Fixed Thresholds Fall Short
+
 Existing semantic caches rely on a **global static threshold** to decide whether to reuse a cached response (exploit) or invoke the LLM (explore). If the similarity score exceeds this threshold, the cache reuses the response; otherwise, it infers the model. This strategy is fundamentally limited.
 
 - **Uniform threshold, diverse prompts:** A fixed threshold assumes all embeddings are equally distributed—ignoring that similarity is context-dependent.
@@ -161,11 +151,11 @@ vCache overcomes these limitations with two ideas:
 ### Benefits
 
 - **Reliability**  
-  Formally bounds the rate of incorrect cache hits to your chosen tolerance.  
+  Formally bounds the rate of incorrect cache hits to your chosen tolerance.
 - **Performance**  
-  Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency.  
+  Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency.
 - **Simplicity**  
-  Plug in any embedding model; vCache learns and adapts automatically at runtime.
+ Plug in any embedding model; vCache learns and adapts automatically at runtime.
 
   <p align="left">
   <picture>
@@ -178,25 +168,24 @@ vCache overcomes these limitations with two ideas:
 
 Please refer to the [vCache paper](https://arxiv.org/abs/2502.03771) for further details.
 
-
 ## 🛠 Developer Guide
 
 For advanced usage and development setup, see the [Developer Guide](ReadMe_Dev.md).
 
-
-
 ## 📊 Benchmarking vCache
 
 vCache includes a benchmarking framework to evaluate:
+
 - **Cache hit rate**
 - **Error rate**
 - **Latency improvement**
 - **...**
 
 We provide three open benchmarks:
-- **SemCacheLmArena** (chat-style prompts) - [Dataset  ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
-- **SemCacheClassification** (classification queries) - [Dataset  ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
-- **SemCacheSearchQueries** (real-world search logs) - [Dataset  ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
+
+- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
+- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
+- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
 
 See the [Benchmarking Documentation](benchmarks/ReadMe.md) for instructions.
 
@@ -211,4 +200,4 @@ If you use vCache for your research, please cite our [paper](https://arxiv.org/a
   journal={arXiv preprint arXiv:2502.03771},
   year={2025}
 }
-```
+```
diff --git a/benchmarks/benchmark.py b/benchmarks/benchmark.py
@@ -23,16 +23,13 @@
 from vcache.vcache_core.cache.embedding_engine.strategies.benchmark import (
     BenchmarkEmbeddingEngine,
 )
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import (
-    InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
-    EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db import (
+from vcache.vcache_core.cache.vector_db import (
     HNSWLibVectorDB,
     SimilarityMetricType,
 )
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+    EmbeddingMetadataObj,
+)
 from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator
 from vcache.vcache_core.similarity_evaluator.strategies.llm_comparison import (
     LLMComparisonSimilarityEvaluator,
@@ -396,7 +393,7 @@ def dump_results_to_json(self):
         var_ts_dict = {}
 
         metadata_objects: List[EmbeddingMetadataObj] = (
-            self.vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects()
+            self.vcache.vcache_config.vector_db.get_all_embedding_metadata_objects()
         )
 
         for metadata_object in metadata_objects:
@@ -486,7 +483,6 @@ def __run_baseline(
             similarity_metric_type=SimilarityMetricType.COSINE,
             max_capacity=MAX_VECTOR_DB_CAPACITY,
         ),
-        embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
         similarity_evaluator=similarity_evaluator,
     )
     vcache: VCache = VCache(vcache_config, vcache_policy)

diff --git a/test.py b/test.py
@@ -4,10 +4,7 @@
 from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import (
     OpenAIEmbeddingEngine,
 )
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import (
-    InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db import (
+from vcache.vcache_core.cache.vector_db import (
     HNSWLibVectorDB,
     SimilarityMetricType,
 )
@@ -27,7 +24,6 @@
         similarity_metric_type=SimilarityMetricType.COSINE,
         max_capacity=100000,
     ),
-    embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
     similarity_evaluator=StringComparisonSimilarityEvaluator,
 )
 vcache: VCache = VCache(vcache_config, vcache_policy)
diff --git a/tests/integration/test_concurrency.py b/tests/integration/test_concurrency.py
@@ -8,7 +8,6 @@
 
 from vcache import (
     HNSWLibVectorDB,
-    InMemoryEmbeddingMetadataStorage,
     LangChainEmbeddingEngine,
     StringComparisonSimilarityEvaluator,
     VCache,
@@ -46,7 +45,6 @@ def answers_similar(a, b):
                 model_name="sentence-transformers/all-mpnet-base-v2"
             ),
             vector_db=HNSWLibVectorDB(),
-            embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
             similarity_evaluator=similarity_evaluator,
         )
 
@@ -93,7 +91,9 @@ def do_inference(prompt):
                     time.sleep(1.5)
                     executor.map(do_inference, concurrent_prompts_chunk_2)
 
-        all_metadata_objects = vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects()
+        all_metadata_objects = (
+            vcache.vcache_config.vector_db.get_all_embedding_metadata_objects()
+        )
         final_observation_count = len(all_metadata_objects)
 
         for i, metadata_object in enumerate(all_metadata_objects):

diff --git a/tests/integration/test_dynamic_threshold.py b/tests/integration/test_dynamic_threshold.py
@@ -4,7 +4,6 @@
 
 from vcache import (
     HNSWLibVectorDB,
-    InMemoryEmbeddingMetadataStorage,
     LangChainEmbeddingEngine,
     OpenAIInferenceEngine,
     VCache,
@@ -25,7 +24,6 @@ def create_default_config_and_policy():
             model_name="sentence-transformers/all-mpnet-base-v2"
         ),
         vector_db=HNSWLibVectorDB(),
-        embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
         system_prompt="Please answer in a single word with the first letter capitalized. Example: London",
     )
     policy = VerifiedDecisionPolicy(delta=0.05)

diff --git a/tests/integration/test_static_threshold.py b/tests/integration/test_static_threshold.py
@@ -5,7 +5,6 @@
 from vcache import (
     BenchmarkStaticDecisionPolicy,
     HNSWLibVectorDB,
-    InMemoryEmbeddingMetadataStorage,
     LangChainEmbeddingEngine,
     OpenAIInferenceEngine,
     VCache,
@@ -25,7 +24,6 @@ def create_default_config_and_policy():
             model_name="sentence-transformers/all-mpnet-base-v2"
         ),
         vector_db=HNSWLibVectorDB(),
-        embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
     )
     policy = BenchmarkStaticDecisionPolicy(threshold=0.8)
     return config, policy

diff --git a/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py b/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py
diff --git a/tests/unit/VCachePolicyStrategy/test_vcache_policy.py b/tests/unit/VCachePolicyStrategy/test_vcache_policy.py
@@ -3,7 +3,7 @@
 from unittest.mock import MagicMock, patch
 
 from vcache.config import VCacheConfig
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.vector_db import (
     EmbeddingMetadataObj,
 )
 from vcache.vcache_policy.strategies.verified import (
@@ -48,11 +48,9 @@ def update_metadata(embedding_id, embedding_metadata):
         mock_config = MagicMock(spec=VCacheConfig)
         mock_config.inference_engine = self.mock_inference_engine
         mock_config.similarity_evaluator = self.mock_similarity_evaluator
-        # Add all required attributes for Cache creation
-        mock_config.embedding_engine = MagicMock()
-        mock_config.embedding_metadata_storage = MagicMock()
         mock_config.vector_db = MagicMock()
         mock_config.eviction_policy = MagicMock()
+        mock_config.embedding_engine = MagicMock()
 
         self.policy = VerifiedDecisionPolicy()
         self.policy.setup(mock_config)