Quick Refresh: Cache using EhCache, Spring and Hibernate
Introduction to Cache:
¡ What is Cache?
} A store of things that will be required in future and can be retrieved rapidly.
¡ Why Cache makes application fast?
} Locality of reference – retrieval of data is fast because it is in memory.
} 80:20 Rule – If 20% of object are used 80% of the time and a way can be found to reduce the cost of obtaining that 20%, then system performance will improve.
} Less number of system-of-records calls.
}
¡ How Cache Works?
} Application code consult the cache first
} If cache contains the data, then return the data directly
} Otherwise, the application code must fetch the data from the system-of-record, store the data in cache, then return.
¡ Since 2003. EHCache is a modern, modular family of caching tools
¡ The most widely used Java platform Cache
¡ It can be used under Open Source Apache 2.0 License
¡ Embedded in most popular Java frameworks/apps
} Hibernate (as Second-Level Cache Provider) , Spring
} Transactional caches & cache consistency modes
} Fast, Lightweight
} Less than 1 MB
} Easy to use API
} Minimum coding. Grows with your application with only two lines of configuration
} JSR107 (JCACHE) Support
EhCache Configuration:
¡ If the configuration file is not provided, a default configuration is always loaded at runtime.
¡ ehcache.xml
<cache name="currencyCache"
maxEntriesLocalHeap="200"
eternal="false"
timeToLiveSeconds="1800"
maxEntriesLocalDisk="250"
memoryStoreEvictionPolicy= "LRU">
<persistence strategy="localTempSwap" synchronousWrites="false" />
</cache>
¡ MemoryStoreEvictionPolicy attribute - Legal values are LRU (default), LFU and FIFO.
¡ Configuring the Disk Store
¡ Temporary store (localTempSwap) - The localTempSwap persistence strategy allows the memory store to overflow to disk when it becomes full. This option makes the disk a temporary store because overflow data does not survive restarts or failures. When the node is restarted, any existing data on disk is cleared because it is not designed to be reloaded. The localTempSwap disk store creates a data file for each cache on startup called “<cache_name>.data".
¡ Persistent store (localRestartable) - This option implements a restartable store for all in-memory data. After any restart, the data set is automatically reloaded from disk to the in-memory stores.
¡ Disk Store Expiry and Eviction –
¡ Expired elements are eventually evicted to free up disk space. The element is also removed from the in-memory index of elements.
¡ One thread per cache is used to remove expired elements. The optional attribute diskExpiryThreadIntervalSecond s sets the interval between runs of the expiry thread.
¡ maxEntriesLocalDisk – The maximum sum total number of elements (cache entries) allowed on the disk tier for a cache. If this target is exceeded, eviction occurs to bring the count within the allowed target. The default value is 0, which means no eviction takes place (infinite size is allowed). A setting of 0 means that no eviction of the cache's entries takes place, and consequently can cause the node to run out of disk space.
¡ eternal –
} Eternal elements and caches do not expire
} The eternal attribute, when set to “true”, overrides timeToLive and timeToIdle so that no expiration can take place.
EhCache Configuration - Dynamically Changing:
¡ In ehcache.xml, you can disable dynamic configuration by setting the <ehcache> element's dynamicConfig attribute to “false”.
¡ Expiration settings
} timeToLive – The maximum number of seconds an element can exist in the cache regardless of access. The element expires at this limit and will no longer be returned from the cache. The default value is 0, which means no TTL eviction takes place (infinite lifetime).
} timeToIdle – The maximum number of seconds an element can exist in the cache without being accessed. The element expires at this limit and will no longer be returned from the cache. The default value is 0, which means no TTI eviction takes place (infinite lifetime).
¡ Local sizing attributes
} maxEntriesLocalHeap
} maxBytesLocalHeap
} maxEntriesLocalDisk
} maxBytesLocalDisk.
¡ memory-store eviction policy.
¡ CacheEventListeners can be added and removed dynamically.
Enable Caching in Java App using EHCache API:
¡ Step 1. Add ehcache Java Archive (JAR) to application classpath
} If using maven project then add below dependency
<dependency>
<groupId>net.sf.ehcache</ groupId>
<artifactId>ehcache</ artifactId>
<version>2.8.2</version>
</dependency>
¡ Step 2. Place configuration in ehcache.xml and add it to application classpath
¡ Step 3. Create a CacheManager
¡ Step4. Reference a Cache
¡ Step5. Putting an Element in Cache
¡ Step6. Getting an Element from Cache
¡ Step7. Removing an Element from Cache
¡ Step8. Shutdown CacheManager.
Sample Program:
public class EhCacheTest
{
public static void main(String[] str)
{
CacheManager manager = CacheManager.newInstance();
Cache cache = manager.getCache("sampleCache" );
cache.put(new Element("key", "value"));
System.out.println("Value="+ cache.get("key").getValue());
manager.shutdown();
}
}
Spring Caching (Method cache) using EhCache:
¡ The Spring caching is in the spring-context.jar, to support Ehcache caching, you need to include the spring-context-support. jar as well.
¡ Enabling caching annotations – add following entry in application context xml file.
<cache:annotation-driven cache-manager=" postTradeFissEhCacheManager" />
¡ @Cacheable
¡ @CachePut
¡ @CacheEvict
@Cacheable(value=" companyInstrumentCache")
public CompanyInstrument getCompanyInstrument(long cmpyNum, long instNum)
{ //some Logic
}
@CacheEvict(value = "companyInstrumentCache")
public void evictCompanyInstrumentCacheEnt ry(long cmpyNum, long instNum)
{// Intentionally blank
}
¡ Creating Method Level Cache using @Cacheable
@Cacheable(value = "currencyCache")
public List<String> retrieveActiveCurrencies()
{
List<String> activeCurrencies = _currencyRepo. findByActiveFlag("Y");
return activeCurrencies;
}
@Cacheable(value = " personGenCodesTranslationServi ceCache")
public List<GenCodesTranslation> getPersonGenCodesTranslation( String fromSystem)
{
return findGenCodesTranslation( fromSystem, "RIM", "PID", "PID");
}
¡ Default key generation:
} If no params are given, return 0.
} If only one param is given and is primitive, return that instance.
} Else return a key computed from the hashes of all parameters.
¡ Custom Key Generation Declaration
} @Cacheable annotation allows the user to specify how the key is generated through its key attribute. The developer can also use SpEL to pick the arguments of interest (or their nested properties), perform operations or even invoke arbitrary methods without having to write any code or implement any interface.
¡ Adding data to cache manually using @CachePut
@Cacheable(value ="instrumentCache", key="#instType + #instCode")
public List<Instrument> getInstrumentDetails(String instType, String instCode)
{
return _instrumentRepo. findByInstTypeAndInstCode( instType, instCode);
}
@CachePut(value ="instrumentCache" , key="#instType + #instCode")
public Instrument addInstrumentToCache(String instType, String instCode, Instrument instrument)
{
return instrument;
}
Cache in Hibernate:
¡ Hibernate supports following caches: 1st Level Cache, 2nd Level Cache, Query Cache
¡ 1st level cache of Hibernate entities stored in and scoped specifically to a particular open Session.
¡ The Hibernate second level cache is an application level cache that is shared across sessions. The second level cache is divided into regions of four types: entity, collection, query, and timestamp.
¡ Entity and collection regions cache the data from entities and relationships for storing entity data.
¡ The query cache is a separate cache that stores query results only.
¡ The timestamp cache keeps track of the last update timestamp for each table (this timestamp is updated for any table modification). If query caching is on, there is exactly one timestamp cache and it is utilized by all query cache instances. Any time the query cache is checked for a query, the timestamp cache is checked for all tables in the query. If the timestamp of the last update on a table is greater than the time the query results were cached, then the entry is removed and the lookup is a miss. There is only one timestamp cache shared by all query caches.
How does Hibernate second level-cache work?
¡ The second level cache stores the entity data, but NOT the entities themselves. The data is stored in a 'dehydrated' format which looks like a hash map where the key is the entity Id, and the value is a list of primitive values.
¡ The second level cache gets populated when an object is loaded by Id from the database, using for example entityManager.find(), or when traversing lazy initialized relations.
How does the query cache work?
¡ The query cache looks conceptually like an hash map where the key is composed by the query string itself and any bind parameters passed with the query, and the value is a list of entity Id's that match the query. When a query is made that hits the query cache, that set of entity identifiers can be retrieved and then resolved through the first or second level caches instead of retrieving those entities from the database.
¡ Some queries don't return entities, instead they return only primitive values. In those cases the values themselves will be stored in the query cache. The query cache gets populated when a cacheable JPQL/HQL query gets executed.
What is the relation between the two caches?
¡ If a query under execution has previously cached results, then no SQL statement is sent to the database. Instead the query results are retrieved from the query cache, and then the cached entity identifiers are used to access the second level cache.
¡ If the second level cache contains data for a given Id, it re-hydrates the entity and returns it. If the second level cache does not contain the results for that particular Id, then an SQL query is issued to load the entity from the database.
Ehcache as Hibernate Second Level/Query Cache:
¡ Configuring persistence.xml
¡ Second Level Cache
<property name="hibernate.cache.use_ second_level_cache">true</ property>
<property name="hibernate.cache.region. factory_class">net.sf.ehcache. hibernate.EhCacheRegionFactory </property>
Note1: The first turns on the second level cache in general.
Note2: Prefer using EhCacheRegionFactory instead of SingletonEhCacheRegionFactory. Using EhCacheRegionFactory means that Hibernate will create separate cache regions for Hibernate caching, instead of trying to reuse cache regions defined elsewhere in the application.
} Query Cache
<property name="hibernate.cache.use_ query_cache">true</property>
Note: If this is set to false, the query and timestamp cache regions are not created or used.
Cache Concurrency Strategy:
¡ caching strategy can be of following types:
} none : No caching will happen.
} read-only : If your application needs to read, but not modify, instances of a persistent class, a read-only cache can be used.
} read-write : If the application needs to update data, a read-write cache might be appropriate.
} nonstrict-read-write : If the application only occasionally needs to update data (i.e. if it is extremely unlikely that two transactions would try to update the same item simultaneously), and strict transaction isolation is not required, a nonstrict-read-write cache might be appropriate.
} transactional : The transactional cache strategy provides support for fully transactional cache providers such as JBoss TreeCache. Such a cache can only be used in a JTA environment.
Using the second level cache (Cache Entity):
¡ In order to cache entities in second level cache, annotate them with @org.hibernate.annotations. Cache
pakcage com.somecompany.someproject. domain;
@Entity
@Table(name = "V_PRIORITY_CATEGORIES")
@Cache(usage = CacheConcurrencyStrategy.READ_ ONLY)
public class CategoryPriority implements Serializable
{
}
¡ Ehcache Settings for Entity Objects : Hibernate bases the names of Domain Object caches on the fully qualified name of Domain Objects.
<cache name="com.somecompany. someproject.domain. CategoryPriority "
maxEntriesLocalHeap="10000"
eternal="false"
timeToIdleSeconds="300"
timeToLiveSeconds="600"
<persistence strategy="localTempSwap"/>
maxEntriesLocalHeap="10000"
eternal="false"
timeToIdleSeconds="300"
timeToLiveSeconds="600"
<persistence strategy="localTempSwap"/>
/>
Using the second level cache (Cache Collection):
¡ Associations can also be cached by the second level cache, but by default this is not done. In order to enable caching of an association, we need to apply @Cache to the association itself:
@Entity
public class SomeEntity {
@OneToMany
@Cache(usage= CacheConcurrencyStrategy.READ_ ONLY, region="yourCollectionRegion")
private Set<OtherEntity> other;
}
Ehcache Settings for Collections: Hibernate creates collection cache names based on the fully qualified name of the Domain Object followed by "." and the collection field name.
<cache name="com.somecompany. someproject.domain. SomeEntity . other "
maxEntriesLocalHeap="450"
eternal="false"
timeToLiveSeconds="600"
<persistence strategy="localTempSwap"/>
/>
maxEntriesLocalHeap="450"
eternal="false"
timeToLiveSeconds="600"
<persistence strategy="localTempSwap"/>
/>
Using the second level cache (Query Cache):
¡ After configuring the query cache, by default no queries are cached yet. Queries need to be marked as cached explicitly
@NamedQuery(name="account. queryName",
query="select acct from Account ...",
hints={ @QueryHint(name=" org.hibernate.cacheable“, value="true“) }
})
¡ And this is how to mark a criteria query as cached:
final Query query = session.createQuery( "select st.id, st.name from StreetType st where st.country.id = :countryId order by st.sortOrder desc, st.name");
query.setLong("countryId", country.getId().longValue());
query.setCacheable(true); //caches the query
query.setCacheRegion("query. StreetTypes"); // line sets the name of the Query Cache.
query.list();
query.setLong("countryId", country.getId().longValue());
query.setCacheable(true); //caches the query
query.setCacheRegion("query.
query.list();
¡ Ehcache Settings for Queries
} StandardQueryCache: This cache is used if you use a query cache without setting a name.
} UpdateTimestampsCache: Tracks the timestamps of the most recent updates to particular tables. It is important that the cache timeout of the underlying cache implementation be set to a higher value than the timeouts of any of the query caches. In fact, it is recommend that the underlying cache not be configured for expiry at all.
} Named Query Caches: In addition, a QueryCache can be given a specific name in Hibernate using Query.setCacheRegion(String name). The name of the cache in ehcache.xml is then the name given in that method. The name can be whatever you want, but by convention you should use “query.” followed by a descriptive name.
More caching is better, right?
I think many users who turn on the second level cache also turn on the query cache because they don’t know what the query cache does. (I certainly turned it on the first time I tried second level cache.) Well, as it turns out, that’s a bad idea. The Hibernate query cache is actually not only not helpful but downright harmful to latency and scalability in many common scenarios.
If I were to summarize one piece of advice from this whole article it would be: turn off the query cache unless a) you know why you’re turning it on and b) you can measure a real improvement in a realistic load. Let’s look at some reasons why…
1. The primary reason that the query cache is often not useful is that the results are constantly being invalidated by table modifications. As I mentioned earlier, any table modification causes the timestamp cache to be updated. When you do a lookup that returns any entity through the query cache, it’s possible it is invalidated by an insert or update totally unrelated to that entity. For example, you might be doing a query by social security number on a Person. If any other Person in your db has been inserted or updated since the last hit on that ssn, the result will be thrown out (even though it’s likely still valid).
It is quite common to set up a query cache but find that the hit rate is very low as the data is constantly being invalidated. But fear not – there is a way around this issue which is discussed in the next section.
2. Probably the next most common query cache problem is memory usage. Query cache is notorious for gobbling up your heap, mainly because of its keys. First you have the query string itself, which is quite commonly hundreds of characters long and is frequently repeated with different bind parameters.
Second, the bind parameters were until recently the actual objects passed in to the Query or Criteria (if that’s how you constructed the query). That meant that it was quite easy to put session-scoped entities and other complex objects into your cache where they would sit forever, possibly holding references to collections and large chunks of your session cache. This issue is fixed by in the Hibernate 3.2.7 and 3.3.2 releases which convert those entities to identifiers.
3. Another issue we found internally in testing at Terracotta (but which I haven’t seen mentioned anywhere else) is that turning on the query cache introduces a new source of lock contention in the timestamp cache. This cache has a single coarse lock that is locked for all inserts/updates/deletes to update the timestamp of a table and also for every lookup that occurs through the query cache. Under load, this lock can easily become a bottleneck.
Interestingly, the first part can become an issue even if NO queries are cached! One place you might see that is when doing multi-threaded table loading – every insert into any table across all threads must obtain the same lock in the timestamp cache, even if no queries are being cached. This is where it can be quite dangerous to turn on query cache if you aren’t actually using it.
4. Not really a problem per se, but a common misconfiguration that can occur is with the eviction settings for the timestamp cache. Because you want your query cache to always be able to find the last update timestamp in the timestamp cache, you want to be sure that the timestamp cache does not evict entries before the query cache! In general, it is recommended that the timestamp cache be eternal or evicted on a “time to live” longer than the query cache “time to live”. The timestamp cache should not use “time to idle” based eviction.
You might wonder what happens in the case where a timestamp is evicted too early. I think if I’m reading the code right in UpdateTimestampsCache. isUpToDate(), this situation is treated as the timestamp cache saying no modifications have occurred and allowing a query cache entry to stand. It is possible in that case for updates to have occurred before or after timestamp eviction that would not be noticed and you might actually see stale values. If so, that would be a pretty subtle bug.
When is the query cache useful?
The one case that seems to be a sweet spot for using the query cache is when you frequently need to look up an entity based on a natural key. A natural key is a field or fields that form a unique way to identify an entity in a table but is not the primary key. For example, you might have a Person table with an auto-generated primary key in the database. But several identifiers for a Person might form a natural key, such as social security number or email address.
If you frequently have a user or external input providing you a natural key for lookup, your normal second level entity cache is NOT helpful because it caches based on primary key. In this case, you can use a Criteria-based cached query lookup on an immutable natural key column to do that first hop from natural key to primary key and remember that step for the future.
That query (including the bind parameter holding the natural key value) will be mapped to a result of one primary key id. You can then leverage the normal second level entity cache to do the primary key lookup.
If you noticed in the workflow up above, there is a special check made for whether the lookup is on a natural immutable key. This hint can only be supplied by a Criteria, not by a Query or other means. The columns must be marked as natural keys and flagged as immutable in your mapping file.
If you do this, you skip the check on the timestamp cache! This is because for an *immutable* natural key, it’s impossible for a table modification to change the mapping of natural to primary key. Skipping that check makes the query cache higher performance and avoids the invalidation problem, yielding much higher hit rates on your query cache.
Note that this still doesn’t avoid the lock contention created in the timestamp cache synchronization.
References
Comments
Post a Comment