the challenge
Earlier last week, I was asked to implement an in-memory database similar to Redis (name-value pairs). This post is about code style, algorithmic performance, and thought process. For simplicity's sake, my program to the challenge will receive commands via stdin, and should write appropriate responses to stdout.
For a full description of the challenge and problem description, simply google "simple database challenge" or go to http://www.thumbtack.com/challenges/simple-database.
thought process & data structure
I'll be using HashMap to meet the performance requirement of O(log N) or better.
To support transaction commands - BEGIN, ROLLBACK, COMMIT - I chose to use a linked list to hold incremental state changes over time within each transaction scope. When a new transaction scope starts (BEGIN command), a block is created and added to the end of the linked list. Each block will only hold incremental state changes pertaining to the transaction scope the command is issued in. The changes within a transaction scope is always held in its own block, making ROLLBACK simple and straightforward. On a rollback, the last block can simply be abandoned and eliminated from the end of the linked list. The 'state changes' recorded in each block are simply in the form of name-value pairs. Only name-value that changes are recorded in the latest block of the linked list. When a name is set to a value within a transaction scope, the new name-value pair is noted in the latest block of the linked list. When the same name is set to another value in that transaction scope, the name-value in the latest block of the linked list is then updated accordingly. The name-value setting approach applies to both SET and UNSET commands. UNSET is treated as if a SET command setting a name to null value. To retrieve a value of a given name for GET command, I traverse backward through the blocks in the linked list. Starting from the last block, if the name is found in the block, the value is the answer. If the name is not present in the block, continue to traverse backward to the previous block for the value of the name. Repeat the traversal until a value is found for the given name, or until depleting the block traversal. To support NUMEQUALTO command efficiently, I maintain a value-counter pairs for each block similar to the way I trace the 'state change' of the name-value pairs. The value-counter pairs is a bit more complicated to maintain because when a name is set to a different value in a transaction block, it may actually involves changing multiple value-counter pairs in that transaction block. For example, when we 'SET a 10' followed by 'SET a 20', the 'SET a 20' actually involves changing value-counter pairs for both value 10 and 20. The NUMEQUALTO counter returned do not need to be tracked for names with null value, since 'NUMEQUALTO null' is not a valid syntax. To retrieve a counter of a given value for NUMEQUALTO command, the same approach works. That is to traverse backward through the blocks in the linked list. Start from the last block, if the value is found in the block, the counter is the answer. If the value is not present in the block, continue to traverse backward to the previous block for the counter of the value. Repeat the traversal until a counter is found for the given value, or until depleting the block traversal. When maintaining the name-value pairs and value-counter pairs, I watched out for the edge conditions of (0, null, etc.) to avoid subtle bugs from happening. Lastly, the COMMIT command will close all open transaction blocks simply by flattening all blocks in the linked list and condense them all into one single block. source code
You'll find two classes - Database and TransactionBlock - in the source code.
Database class - A DB contains a sequence (linked list) of uncommitted Transaction Blocks. Each block is a TransactionBlock object. These blocks (TransactionBlock objects) are formed in lined list. Extra block is attached to the end of the linked list when a transaction command BEGIN is issued. The last block of the linked list is abandoned when a transaction command ROLLBACK is issued. TransactionBlock class - A transaction block conceptually includes (points to) all past uncommitted transactions accessible through traversing to the previous transaction block. execute the solution written in java
To download my solution implemented in Java 1.7:
![]()
To run the solution in interactive mode:
To run the only test case come with the original problem:
discussion
One question I have is toward the UNSET command. The challenge is not clear about whether the UNSET command means the variable is never set, or never set in the context of the existing transaction scope. In other words, whether the UNSET can be rollback or not is not clear to me. To avoid confusion, it is best to apply test cases through which the spec may be well defined.
Do you have a complete different thought process? Are you aware of things to improve looking over my code style? Please leave your reply below and I'll get back to you. Thanks for reading!
8 Comments
Sachin
9/5/2016 07:15:36 pm
Runtime limitation given for this is O(Log(N)) where N is total number of variables stored in database. But for get operation, your description indicates, you'll travel till first block. so Complexity would be O(T) where T is number of transaction and T >> N
Reply
9/5/2016 08:38:56 pm
You're right, Sachin.
Reply
Sachin Doiphode
9/6/2016 07:42:14 am
Yes, I did the same. 9/6/2016 03:46:06 pm
How about finding a way to speed up HashMap copy? For example,
N
2/11/2018 05:03:45 pm
Hi,
Tim
1/6/2025 10:53:13 am
Where O(logN) is coming from?
Tony
9/29/2016 06:08:42 pm
In unset, you are setting to null. But what if null is a valid string value ?
Reply
Hsufeng Ko
11/9/2016 04:52:55 pm
A command like "UNSET a" would unset variable 'a'. To implement this, the code simply nullify variable 'a'. null is a special literal in Java to signify uninitialized state, not to be confused with a string with value "null".
Reply
Leave a Reply. |
Categories
All
Archives
May 2020
|