楼主: liuxf666
621 7

[学习笔记] 【学习笔记】Foundations of Data Systems - Log-structured storage [推广有奖]

  • 1关注
  • 3粉丝

学科带头人

54%

还不是VIP/贵宾

-

威望
0
论坛币
13008 个
通用积分
410.0729
学术水平
109 点
热心指数
112 点
信用等级
103 点
经验
71224 点
帖子
1081
精华
0
在线时间
1537 小时
注册时间
2016-7-19
最后登录
2024-3-20

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  • Log-structured storage
    • Hash Indexes
      • use hash indexes for key-value data
      • The simplest possible indexing strategy:
        • keep an in-memory hash map where every key is mapped to a byte offset in the data file - the location at which the value can be found.
        • Whenever append a new key-value pair to the file, also update the hash map to reflect the offset of the data you just wrote (this works both for inserting new keys and for updating existing keys).
        • When want to look up a value, use the hash map to find the offset in the data file, seek to that location, and read the value (used by Bitcask - the default storage engine in Riak)
      • NOTE:
        • A storage engine like Bitcask is well suited to situations where the value for each key is updated frequently. For example, the key might be the URL of a cat video, and the value might be the number of times it has been played (incremented every time someone hits the play button). In this kind of workload, there are a lot of writes, but there are not too many distinct keys - you have a large number of writes per key, but it’s feasible to keep all keys in memory.
        • Append-only and break the log into segments of a certain size by closing a segment file when it reaches a certain size, and making subsequent writes to a new segment file.
        • Perform compaction on these segments - Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key. The merging and compaction of frozen segments can be done in a background thread, and while it is going on, we can still continue to serve read and write requests as normal, using the old segment files. After the merging process is complete, we switch read requests to using the new merged segment instead of the old segments - and then the old segment files can simply be deleted.
        • Lots of detail goes into making this simple idea work in practice. Briefly, some of the issues that are important in a real implementation are:
      • File format: CSV is not the best format for a log. It’s faster and simpler to use a binary format that first encodes the length of a string in bytes, followed by the raw string (without need for escaping).
      • Deleting records: If you want to delete a key and its associated value, you have to append a special deletion record to the data file (sometimes called a tombstone). When log segments are merged, the tombstone tells the merging process to discard any previous values for the deleted key.
      • Crash recovery: If the database is restarted, the in-memory hash maps are lost. In principle, you can restore each segment’s hash map by reading the entire segment file from beginning to end and noting the offset of the most recent value for every key as you go along. However, that might take a long time if the segment files are large, which would make server restarts painful. Bitcask speeds up recovery by storing a snapshot of each segment’s hash map on disk, which can be loaded into memory more quickly.
      • Partially written records: The database may crash at any time, including halfway through appending a record to the log. Bitcask files include checksums, allowing such corrupted parts of the log to be detected and ignored.
      • Concurrency control: As writes are appended to the log in a strictly sequential order, a common implementation choice is to have only one writer thread. Data file segments are append-only and otherwise immutable, so they can be read concurrently by multiple threads.



        • Limitations:
          • The hash table must fit in memory.
          • Range queries are not efficient.





      • SSTables (Sorted String Table) - The log-structured indexes break the database down into variable-size segments, typically several megabytes or more in size, and always write a segment sequentially.
        • change to the format of our segment files: require that the sequence of key-value pairs is sorted by key
        • Pros: can use sparse in-memory index (no need to keep all keys in memory) and range queries are more efficient
        • Constructing and maintaining SSTables:
          • how do you get your data to be sorted by key in the first place?
            • When a write comes in, add it to an in-memory balanced tree data structure (for example, a red-black tree). This in-memory tree is sometimes called a memtable.
            • When the memtable gets bigger than some threshold - typically a few megabytes  - write it out to disk as an SSTable file. This can be done efficiently because the tree already maintains the key-value pairs sorted by key. The new SSTable file becomes the most recent segment of the database. While the SSTable is being written out to disk, writes can continue to a new memtable instance.
            • In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc.
            • From time to time, run a merging and compaction process in the background to combine segment files and to discard overwritten or deleted values.
            • NOTE: it only suffers from one problem: if the database crashes, the most recent writes (which are in the memtable but not yet written out to disk) are lost. In order to avoid that problem, we can keep a separate log on disk to which every write is immediately appended - only for recovery purpose.
      • LSM (Log-Structured Merge-Tree) -Trees - Making an LSM-tree out of SSTables
        • he basic idea of LSM-trees: keeping a cascade of SSTables that are merged in the background - is simple and effective. Even when the dataset is much bigger than the available memory it continues to work well. Since data is stored in sorted order, you can efficiently perform range queries (scanning all keys above some minimum and up to some maximum), and because the disk writes are sequential the LSM-tree can support remarkably high write throughput.
        • Performance optimizations:
          • the LSM-tree algorithm can be slow when looking up keys that do not exist in the database: you have to check the memtable, then the segments all the way
        • back to the oldest (possibly having to read from disk for each one) before you can be sure that the key does not exist.
          • In order to optimize this kind of access, storage engines often use additional Bloom filters. (A Bloom filter is a memory-efficient data structure for approximating the contents of a set. It can tell you if a key does not appear in the database, and thus saves many unnecessary disk reads for nonexistent keys.)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Foundations structured foundation Structure Systems

已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
经管之家编辑部 + 100 + 3 + 3 + 3 精彩帖子

总评分: 论坛币 + 100  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

本帖被以下文库推荐

沙发
笑开心 发表于 2019-3-20 09:04:36 |只看作者 |坛友微信交流群
Thanks

使用道具

为你点赞!

使用道具

板凳
充实每一天 发表于 2019-3-20 09:40:27 |只看作者 |坛友微信交流群
已点赞~

使用道具

报纸
hifinecon 发表于 2019-3-20 10:19:22 |只看作者 |坛友微信交流群

使用道具

地板
苏亮480 发表于 2019-3-20 14:31:15 |只看作者 |坛友微信交流群
谢谢分享,
非常好的学习资料!

使用道具

7
artra2012 在职认证  发表于 2019-3-20 14:46:33 |只看作者 |坛友微信交流群
为您点赞!!!

使用道具

8
sulight 学生认证  发表于 2019-3-20 22:27:34 |只看作者 |坛友微信交流群
谢谢分享,
Perform compaction on these segments - Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key. The merging and compaction of frozen segments can be done in a background thread, and while it is going on, we can still continue to serve read and write requests as normal, using the old segment files. After the merging process is complete, we switch read requests to using the new merged segment instead of the old segments - and then the old segment files can simply be deleted.

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-24 23:16