Design a Log Storage System

Innovative Log Storage
How do trees connect to the Internet?
They log on.

You are given several logs that each log contains a unique id and timestamp. Timestamp is a string that has the following format: Year:Month:Day:Hour:Minute:Second, for example, 2017:01:01:23:59:59. All domains are zero-padded decimal numbers.

Design a log storage system to implement the following functions:

void Put(int id, string timestamp): Given a log’s unique id and timestamp, store the log in your storage system.

int[] Retrieve(String start, String end, String granularity): Return the id of logs whose timestamps are within the range from start to end. Start and end all have the same format as timestamp. However, granularity means the time level for consideration. For example, start = “2017:01:01:23:59:59”, end = “2017:01:02:23:59:59”, granularity = “Day”, it means that we need to find the logs within the range from Jan. 1st 2017 to Jan. 2nd 2017.

Example 1:

put(1, "2017:01:01:23:59:59");
put(2, "2017:01:01:22:59:59");
put(3, "2016:01:01:00:00:00");
retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Year"); // return [1,2,3], because you need to return all logs within 2016 and 2017.
retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Hour"); // return [1,2], because you need to return all logs start from 2016:01:01:01 to 2017:01:01:23, where log 3 is left outside the range.

Problem Source: leetcode.com

Key:
There could be multiple constraints we can come up with data structure required for log storage problem.
We will make some assumptions:
1. simple
2. most important is scalable log storage system
Since it is log system, put is going to be append operation mostly.
For retrieve, for fast retrieval from balanced tree is faster since it can give first entry and then we can extract until the end time.
However, trees lends themselves to shard and put on multiple machines for distributed workload.

Since for retrieve, lists render themselves easy for putting in disks and also splitting to multiple machines based on ranges, as compared to red-black tree based C++ STL map, we will go with list.

class LogSystem {
private:
    vector<pair<string, int>> log;
    
    // Given that we have fixed timestamp format
    // indices of ":" in given timestamp decides the granularity
    // Therefore for comparison based on granularity, we store mapping of 
    // index of each granularity to get sub-string quickly
    unordered_map<string, int> granularityIndexMap;

public:
    LogSystem() {
       granularityIndexMap = 
            { {"Year", 4}, {"Month", 7}, {"Day", 10}, 
              {"Hour", 13}, {"Minute", 16}, {"Second", 19}
            };
    }
    
    void put(int id, string timestamp) {
        put_in_list(id, timestamp);
    }
    
    vector<int> retrieve(string s, string e, string gra) {
        return retrieve_from_list(s, e, gra);
    }
    
    inline void put_in_list(int id, string ts) {
        log.push_back({ts, id});
    }
    
    vector<int> retrieve_from_list(string s, string e, string g) {
        vector<int> out;
        auto idx = granularityIndexMap[g];
        auto sg = s.substr(0, idx);
        auto eg = e.substr(0, idx);
        for (auto& l : log) {
            // based on granularity truncate the time.
            auto tmpLog = l.first.substr(0, idx);
            // lexicographical string comparison - "2017:02" is < "2017:03"
            if ( tmpLog >= sg && tmpLog <= eg) {
                out.push_back(l.second);
            }
        }
        return out;
    }
};

/**
 * Your LogSystem object will be instantiated and called as such:
 * LogSystem* obj = new LogSystem();
 * obj->put(id,timestamp);
 * vector<int> param_2 = obj->retrieve(s,e,gra);
 */

Published by Jeet

A software developer by profession. In spare time, fancy non-fiction mostly, related to history, finance and politics. A self-proclaimed movie buff - genre, time and era is no bar, only a sumptuous movie is!

Leave a comment