楼主: liuxf666
684 6

[学习笔记] 【学习笔记】Unique ID generation in distributed systems [推广有奖]

  • 1关注
  • 3粉丝

已卖:70份资源

学科带头人

54%

还不是VIP/贵宾

-

威望
0
论坛币
13005 个
通用积分
409.9229
学术水平
109 点
热心指数
112 点
信用等级
103 点
经验
71218 点
帖子
1079
精华
0
在线时间
1538 小时
注册时间
2016-7-19
最后登录
2024-6-8

楼主
liuxf666 发表于 2019-4-27 09:31:59 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Before starting out, we listed out what features were essential in our system:
  • * Generated IDs should be sortable by time
  • IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)

The basis for this is the initial bits(40) represent timestamp and the rest of the bit is formed based on other info like – node-id, machine-id.

Each of our IDs consists of:
–   41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)
–  13 bits that represent the logical shard ID(can be used id )
–  10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

Solution at hand –

1. Using UUIDIndex size is a key consideration if uuid is used as index. Some UUID types are completely random and have no natural sort.Pro – Each application thread generates IDs independently, minimizing points of failure and contention for ID generation. If you use a timestamp as the first component of the ID, the IDs remain time-sortable.
Cons – Generally requires more storage space (96 bits or higher) to make reasonable uniqueness guarantees. Some UUID types are completely random and have no natural sort.
2. Using a Ticket Server – This is one of the very famous approaches where you can simply maintain a table to store just the latest generated ID and every time a node asks for ID they make a ‘select for update’ on this table, update the value with a incremented value and use the selected value as the next ID.
This approach is resilient and distributed in nature. The ID generation can be separated from the actual data store. However there is a risk of Single Point of Failure as all the nodes rely on this table for the next ID and if this service goes down your app may stop functioning properly.
Additionally  MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.
Also this approach might not be suitable in case where the writes per second are very high because that will overload the Ticket Server and also degrade your app performance.

Cons – Can eventually become a write bottleneck (though Flickr reports that, even at huge scale, it’s not an issue). An additional couple of machines (or EC2 instances) to admin. If using a single DB, becomes single point of failure. If using multiple DBs, can no longer guarantee that they are sortable over time.

3. Twitter Snowflake –
Snowflake is a service used to generate unique IDs for objects within Twitter (Tweets, Direct Messages, Users, Collections, Lists etc.). These IDs are unique 64-bit unsigned integers, which are based on time, instead of being sequential. The full ID is composed of a timestamp, a worker number, and a sequence number.
This approach tackles the problem of SPOF as well as the latency issues.

– Here the ID is generated as a concatenation of timestamp, node ID and Sequence number. 41 bits are allotted to timestamp. This also allows the higher bit to be sorted and so allows somewhat sorted data.
– Node ID can be assigned to any physical node when during its startup and it can be retrieved from a shared cache in the cluster. Node ID can occupy next 10 bits. This number are coordinated by Zookeeper.
– The Sequence number can be a monotonically increasing 12 bit number.
Twitter has Snowflake service which is open source.

Pros:
– Snowflake IDs are 64-bits, half the size of a UUID
– Can use time as first component and remain sortable
– Distributed system that can survive nodes dying
Cons:
Would introduce additional complexity and more ‘moving parts’ (ZooKeeper, Snowflake servers) into our architecture


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:distributed Generation Systems tribute System

已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
经管之家编辑部 + 100 + 3 + 3 + 3 精彩帖子

总评分: 论坛币 + 100  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

本帖被以下文库推荐

沙发
经管之家编辑部 在职认证  发表于 2019-4-27 10:44:46
为您点赞!

藤椅
artra2012 在职认证  发表于 2019-4-27 11:25:52
为您点赞!!!

板凳
充实每一天 发表于 2019-4-27 13:14:06 来自手机
点赞

报纸
珍惜点滴 学生认证  发表于 2019-4-27 14:42:30
感谢分享,向您学习,赞!

地板
从1万到一亿 在职认证  发表于 2019-4-27 16:59:13

7
胡明敏 发表于 2019-5-2 22:03:12
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2026-1-3 15:32