Costs Of Capital
February 24, 2020
Anthropology Zoo Assignment(BIOANTHROPOLOGY)
February 24, 2020
Show all

Semantic Caching Demo on SparkSQL and HDFS

Overview: A caching system management used in SparkSQL/Apache Spark and HDFS

Requirements: SparkSQL, ApacheSpark, HDFS, Data caching algorithm and English skills.

Motivation: We are now using the HDFS to store and manage the data. And we do also the data analytics using SparkSQL in Apache Spark.

Normally, the Application Driver of Apache Spark will load the distributed data in HDFS into memory and do the processing. Finally, Spark will return the results to the HDFS. Sometime, the results we got from previous queries could be used again once or more time by next queries. Then, we want to cache these result in our memory long enough by using a mechanism of caching, it called semantic caching.

What we want is: an implementation of semantic caching program in Apache Spark. The program should be done by Scala/Java language but Scala is preferable.

Leave a Reply

Your email address will not be published. Required fields are marked *