Application Development, Product to Market
  • Home
  • Notes
  • Projects
  • Resources
    • Discovery Log
    • Books I Read
    • Blogs I Visit
    • Tools I Use
  • Home
  • Notes
  • Projects
  • Resources
    • Discovery Log
    • Books I Read
    • Blogs I Visit
    • Tools I Use

Simple Database Challenge

9/13/2015

7 Comments

 

the challenge

Earlier last week, I was asked to implement an in-memory database similar to Redis (name-value pairs).  This post is about code style, algorithmic performance, and thought process. For simplicity's sake, my program to the challenge will receive commands via stdin, and should write appropriate responses to stdout.
For a full description of the challenge and problem description, simply google "simple database challenge" or go to http://www.thumbtack.com/challenges/simple-database.

thought process & data structure

Simple Database Challenge
I'll be using HashMap to meet the performance requirement of O(log N) or better.

To support transaction commands - BEGIN, ROLLBACK, COMMIT - I chose to use a linked list to hold incremental state changes over time within each transaction scope. When a new transaction scope starts (BEGIN command), a block is created and added to the end of the linked list. Each block will only hold incremental state changes pertaining to the transaction scope the command is issued in. 

The changes within a transaction scope is always held in its own block, making ROLLBACK simple and straightforward. On a rollback, the last block can simply be abandoned and eliminated from the end of the  linked list.

The 'state changes' recorded in each block are simply in the form of name-value pairs. Only name-value that changes are recorded in the latest block of the linked list. When a name is set to a value within a transaction scope, the new name-value pair is noted in the latest block of the linked list. When the same name is set to another value in that transaction scope, the name-value in the latest block of the linked list is then updated accordingly. The name-value setting approach applies to both SET and UNSET commands. UNSET is treated as if a SET command setting a name to null value.

To retrieve a value of a given name for GET command, I traverse backward through the blocks in the linked list. Starting from the last block, if the name is found in the block, the value is the answer. If the name is not present in the block, continue to traverse backward to the previous block for the value of the name. Repeat the traversal until a value is found for the given name, or until depleting the block traversal.

To support NUMEQUALTO command efficiently, I maintain a value-counter pairs for each block similar to the way I trace the 'state change' of the name-value pairs. The value-counter pairs is a bit more complicated to maintain because when a name is set to a different value in a transaction block, it may actually involves changing multiple value-counter pairs in that transaction block. For example, when we 'SET a 10' followed by 'SET a 20', the 'SET a 20' actually involves changing value-counter pairs for both value 10 and 20. The NUMEQUALTO counter returned do not need to be tracked for names with null value, since 'NUMEQUALTO null' is not a valid syntax.

To retrieve a counter of a given value for NUMEQUALTO command, the same approach works. That is to traverse backward through the blocks in the linked list. Start from the last block, if the value is found in the block, the counter is the answer. If the value is not present in the block, continue to traverse backward to the previous block for the counter of the value. Repeat the traversal until a counter is found for the given value, or until depleting the block traversal.

When maintaining the name-value pairs and value-counter pairs, I watched out for the edge conditions of (0, null, etc.) to avoid subtle bugs from happening.

Lastly, the COMMIT command will close all open transaction blocks simply by flattening all blocks in the linked list and condense them all into one single block.

source code

You'll find two classes - Database and TransactionBlock - in the source code.

Database class - A DB contains a sequence (linked list) of uncommitted Transaction Blocks. Each block is a TransactionBlock object. These blocks (TransactionBlock objects) are formed in lined list. Extra block is attached to the end of the linked list when a transaction command BEGIN is issued. The last block of the linked list is abandoned when a transaction command ROLLBACK is issued.

TransactionBlock class - A transaction block conceptually includes (points to) all past uncommitted transactions accessible through traversing to the previous transaction block.

execute the solution written in java

To download my solution implemented in Java 1.7:

  1. download simpledb.zip attached to end of this post
  2. expand simpledb.zip into SimpleDB folder
simpledb.zip
File Size: 9 kb
File Type: zip
Download File

To run the solution in interactive mode:
  1. cd into SimpleDB folder 
  2. cd into bin folder under SimpleDB folder
  3. java SimpleDB/Database

To run the only test case come with the original problem:
  1. cd into SimpleDB folder 
  2. cd into bin folder under SimpleDB folder
  3. java SimpleDB/Database < test_input/adv1.txt

discussion

One question I have is toward the UNSET command. The challenge is not clear about whether the UNSET command means the variable is never set, or never set in the context of the existing transaction scope. In other words, whether the UNSET can be rollback or not is not clear to me. To avoid confusion, it is best to apply test cases through which the spec may be well defined.

Do you have a complete different thought process? Are you aware of things to improve looking over my code style? Please leave your reply below and I'll get back to you. Thanks for reading!
7 Comments
Sachin
9/5/2016 07:15:36 pm

Runtime limitation given for this is O(Log(N)) where N is total number of variables stored in database. But for get operation, your description indicates, you'll travel till first block. so Complexity would be O(T) where T is number of transaction and T >> N

Reply
Hsufeng Ko link
9/5/2016 08:38:56 pm

You're right, Sachin.

To fix and meed the performance requirement for GET operation, simply make the name_value HashMap maintained by each transaction scope not only contains the updates for those name-value pairs happens in the current transaction scope, but also carry over the name_value HashMap of the previous transaction scope.

By trading memory for speed, as a result, a GET lookup no longer need to traverse back to the previous block, and the GET lookup will be O(Log(N)) where N is the total number of variables stored in the database.

Reply
Sachin Doiphode
9/6/2016 07:42:14 am

Yes, I did the same.
But yes need to trade memory also one thing. I'm copying previous transaction blocks map's to new block. doesn't copying map at each begin trasaction increase its complexity to O(N) ? as we I'm copying the variables.

Hsufeng Ko link
9/6/2016 03:46:06 pm

How about finding a way to speed up HashMap copy? For example,

1. Fast copy hashmap by way of serialization. (http://stackoverflow.com/questions/998376/clone-utility-for-hashmap-in-java). Or,

2. Apply some sort of copy-on-write HashMap. (http://concurrently.blogspot.com/2007/07/implementing-copy-on-write-hashmap-in.html)

N
2/11/2018 05:03:45 pm

Hi,
Your current code shows that the name_value pair in EACH block contains all the variables that reside in the database. because you copy the table from the previous block and so it accumulates all the keys that are inserted in the course of time. So, when does GET require to traverse back? I mean, if the key is not part of the name_value in the current block so FOR SURE it is not available in the previous blocks. Am I missing sth?

Tony
9/29/2016 06:08:42 pm

In unset, you are setting to null. But what if null is a valid string value ?

Reply
Hsufeng Ko
11/9/2016 04:52:55 pm

A command like "UNSET a" would unset variable 'a'. To implement this, the code simply nullify variable 'a'. null is a special literal in Java to signify uninitialized state, not to be confused with a string with value "null".

Reply



Leave a Reply.

    Categories

    All
    Algorithm
    Concurrency
    CQ
    Data Structure
    Design Pattern
    Developer Tool
    Dynamic Programming
    Entrepreneur
    Functional Programming
    IDE
    Java
    JMX
    Marketing
    Marklogic
    Memory
    OSGI
    Performance
    Product
    Product Management
    Security
    Services
    Sling
    Social Media Programming
    Software Development

    Feed Widget

    Archives

    May 2020
    March 2020
    April 2018
    March 2018
    February 2018
    December 2017
    March 2017
    November 2016
    June 2016
    May 2016
    April 2016
    October 2015
    September 2015
    August 2015
    September 2014
    July 2014
    June 2014
    May 2014
    March 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013

    RSS Feed

in loving memory of my mother  and my 4th aunt
Photos used under Creative Commons from net_efekt, schani, visnup, Dan Zen, gadl, bobbigmac, Susana López-Urrutia, jwalsh, Philippe Put, michael pollak, oskay, Creative Tools, Violentz, Kyknoord, mobilyazilar