[学习笔记] 【学习笔记】Unique ID generation in distributed systems [推广有奖]

1关注
3粉丝

已卖：70份资源

学科带头人

54%

还不是VIP/贵宾

威望: 0 级
论坛币: 13005 个
通用积分: 409.9229
学术水平: 109 点
热心指数: 112 点
信用等级: 103 点
经验: 71218 点
帖子: 1079
精华: 0
在线时间: 1538 小时
注册时间: 2016-7-19
最后登录: 2024-6-8

楼主

liuxf666 发表于 2019-4-27 09:31:59 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Before starting out, we listed out what features were essential in our system:

* Generated IDs should be sortable by time
IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)

The basis for this is the initial bits(40) represent timestamp and the rest of the bit is formed based on other info like – node-id, machine-id.

Each of our IDs consists of:
– 41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)
– 13 bits that represent the logical shard ID(can be used id )
– 10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

Solution at hand –

1. Using UUID – Index size is a key consideration if uuid is used as index. Some UUID types are completely random and have no natural sort.Pro – Each application thread generates IDs independently, minimizing points of failure and contention for ID generation. If you use a timestamp as the first component of the ID, the IDs remain time-sortable.
Cons – Generally requires more storage space (96 bits or higher) to make reasonable uniqueness guarantees. Some UUID types are completely random and have no natural sort.
2. Using a Ticket Server – This is one of the very famous approaches where you can simply maintain a table to store just the latest generated ID and every time a node asks for ID they make a ‘select for update’ on this table, update the value with a incremented value and use the selected value as the next ID.
This approach is resilient and distributed in nature. The ID generation can be separated from the actual data store. However there is a risk of Single Point of Failure as all the nodes rely on this table for the next ID and if this service goes down your app may stop functioning properly.
Additionally MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.
Also this approach might not be suitable in case where the writes per second are very high because that will overload the Ticket Server and also degrade your app performance.

Cons – Can eventually become a write bottleneck (though Flickr reports that, even at huge scale, it’s not an issue). An additional couple of machines (or EC2 instances) to admin. If using a single DB, becomes single point of failure. If using multiple DBs, can no longer guarantee that they are sortable over time.

3. Twitter Snowflake –
Snowflake is a service used to generate unique IDs for objects within Twitter (Tweets, Direct Messages, Users, Collections, Lists etc.). These IDs are unique 64-bit unsigned integers, which are based on time, instead of being sequential. The full ID is composed of a timestamp, a worker number, and a sequence number.
This approach tackles the problem of SPOF as well as the latency issues.

– Here the ID is generated as a concatenation of timestamp, node ID and Sequence number. 41 bits are allotted to timestamp. This also allows the higher bit to be sorted and so allows somewhat sorted data.
– Node ID can be assigned to any physical node when during its startup and it can be retrieved from a shared cache in the cluster. Node ID can occupy next 10 bits. This number are coordinated by Zookeeper.
– The Sequence number can be a monotonically increasing 12 bit number.
Twitter has Snowflake service which is open source.

Pros:
– Snowflake IDs are 64-bits, half the size of a UUID
– Can use time as first component and remain sortable
– Distributed system that can survive nodes dying
Cons:
Would introduce additional complexity and more ‘moving parts’ (ZooKeeper, Snowflake servers) into our architecture