摘要翻译:
在现有的许多时间序列距离测度中,动态时间规整(DTW)距离因其在序列比对中的灵活性而被公认为是最准确和最合适的距离测度之一。然而,DTW距离计算的计算量很大。特别是在超大型的时间序列数据库中,由于时间序列数据的高维性会导致I/O开销极高,因此即使使用索引结构进行随机访问,对整个数据库进行顺序扫描也是不切实际的。更具体地说,顺序结构消耗高CPU但低I/O成本,而索引结构需要低CPU但高I/O成本。因此,本文提出了一种新的索引序列结构TWIST(Time Warping In indexed sequential structure,tindexed sequential structure,Time Warping)。当查询序列发出时,TWIST计算一组候选序列与查询序列之间的较低边界距离,并预先确定数据访问顺序,从而减少了大量的顺序访问和随机访问。令人印象深刻的是,我们的索引序列结构在查询过程中实现了几个数量级的显著加速。此外,该方法在查询处理时间、页面访问次数、存储需求等方面均优于现有的同类方法,且不保证错误删除。
---
英文标题:
《Exact Indexing for Massive Time Series Databases under Time Warping
Distance》
---
作者:
Vit Niennattrakul, Pongsakorn Ruengronghirunya, Chotirat Ann
Ratanamahatana
---
最新提交年份:
2009
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Databases 数据库
分类描述:Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
涵盖数据库管理、数据挖掘和数据处理。大致包括ACM学科类E.2、E.5、H.0、H.2和J.1中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Information Retrieval 信息检索
分类描述:Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引,字典,检索,内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
---
英文摘要:
Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process by a few orders of magnitude. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed.
---
PDF链接:
https://arxiv.org/pdf/0906.2459


雷达卡



京公网安备 11010802022788号







