人大经济论坛 › 论坛 › 提问悬赏求职新闻读书功能一区 › 学道会 › 【学习笔记】Foundations of Data Systems - Log-struct ...

发帖

楼主: liuxf666

862 7

[学习笔记] 【学习笔记】Foundations of Data Systems - Log-structured storage [推广有奖]

1关注
3粉丝

已卖：70份资源

学科带头人

54%

还不是VIP/贵宾

威望: 0 级
论坛币: 13005 个
通用积分: 409.9229
学术水平: 109 点
热心指数: 112 点
信用等级: 103 点
经验: 71218 点
帖子: 1079
精华: 0
在线时间: 1538 小时
注册时间: 2016-7-19
最后登录: 2024-6-8

楼主

liuxf666 发表于 2019-3-20 08:53:56 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Log-structured storage
- Hash Indexes
  - use hash indexes for key-value data
  - The simplest possible indexing strategy:
    - keep an in-memory hash map where every key is mapped to a byte offset in the data file - the location at which the value can be found.
    - Whenever append a new key-value pair to the file, also update the hash map to reflect the offset of the data you just wrote (this works both for inserting new keys and for updating existing keys).
    - When want to look up a value, use the hash map to find the offset in the data file, seek to that location, and read the value (used by Bitcask - the default storage engine in Riak)
  - NOTE:
    - A storage engine like Bitcask is well suited to situations where the value for each key is updated frequently. For example, the key might be the URL of a cat video, and the value might be the number of times it has been played (incremented every time someone hits the play button). In this kind of workload, there are a lot of writes, but there are not too many distinct keys - you have a large number of writes per key, but it’s feasible to keep all keys in memory.
    - Append-only and break the log into segments of a certain size by closing a segment file when it reaches a certain size, and making subsequent writes to a new segment file.
    - Perform compaction on these segments - Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key. The merging and compaction of frozen segments can be done in a background thread, and while it is going on, we can still continue to serve read and write requests as normal, using the old segment files. After the merging process is complete, we switch read requests to using the new merged segment instead of the old segments - and then the old segment files can simply be deleted.
    - Lots of detail goes into making this simple idea work in practice. Briefly, some of the issues that are important in a real implementation are:
  - File format: CSV is not the best format for a log. It’s faster and simpler to use a binary format that first encodes the length of a string in bytes, followed by the raw string (without need for escaping).
  - Deleting records: If you want to delete a key and its associated value, you have to append a special deletion record to the data file (sometimes called a tombstone). When log segments are merged, the tombstone tells the merging process to discard any previous values for the deleted key.
  - Crash recovery: If the database is restarted, the in-memory hash maps are lost. In principle, you can restore each segment’s hash map by reading the entire segment file from beginning to end and noting the offset of the most recent value for every key as you go along. However, that might take a long time if the segment files are large, which would make server restarts painful. Bitcask speeds up recovery by storing a snapshot of each segment’s hash map on disk, which can be loaded into memory more quickly.
  - Partially written records: The database may crash at any time, including halfway through appending a record to the log. Bitcask files include checksums, allowing such corrupted parts of the log to be detected and ignored.
  - Concurrency control: As writes are appended to the log in a strictly sequential order, a common implementation choice is to have only one writer thread. Data file segments are append-only and otherwise immutable, so they can be read concurrently by multiple threads.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Foundations structured foundation Structure Systems

已有 1 人评分	论坛币	学术水平	热心指数	信用等级	收起理由
经管之家编辑部	+ 100	+ 3	+ 3	+ 3	精彩帖子

总评分: 论坛币 + 100 学术水平 + 3 热心指数 + 3 信用等级 + 3 查看全部评分

本帖被以下文库推荐

· 学道会最美学习笔记|主题: 8116, 订阅: 89

沙发

笑开心 发表于 2019-3-20 09:04:36

Thanks

藤椅

经管之家编辑部

发表于 2019-3-20 09:31:14

为你点赞！

板凳

充实每一天 发表于 2019-3-20 09:40:27

已点赞~

报纸

hifinecon 发表于 2019-3-20 10:19:22

地板

苏亮480 发表于 2019-3-20 14:31:15

谢谢分享，
非常好的学习资料！

7楼

artra2012

发表于 2019-3-20 14:46:33

为您点赞！！！

8楼

sulight

发表于 2019-3-20 22:27:34

谢谢分享，
Perform compaction on these segments - Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key. The merging and compaction of frozen segments can be done in a background thread, and while it is going on, we can still continue to serve read and write requests as normal, using the old segment files. After the merging process is complete, we switch read requests to using the new merged segment instead of the old segments - and then the old segment files can simply be deleted.

返回列表

发帖

本版微信群

jg-xs1
拉您进交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

[学习笔记] 【学习笔记】Foundations of Data Systems - Log-structured storage [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

一级伯乐勋章

初级学术勋章

中级学术勋章

初级热心勋章

中级热心勋章

初级信用勋章

中级信用勋章

20周年荣誉勋章

本版微信群

[学习笔记] 【学习笔记】Foundations of Data Systems - Log-structured storage [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

一级伯乐勋章

初级学术勋章

中级学术勋章

初级热心勋章

中级热心勋章

初级信用勋章

中级信用勋章

20周年荣誉勋章

本版微信群

扫码加我拉你入群