Title : Effective Blocking for Combining Multiple Entity Resolution Systems
Authors : Aye Chan Mon, Mie Mie Su Thwin
Keywords : entity resolution, data integration, data reduction, indexing, pre-processing
Issue Date : July 2013
Abstract : An important aspect of maintaining information quality in data repositories is determining which sets of records refer to the same real world entity. This so called entity resolution problem comes up frequently for data cleaning and integration. Entity Resolution (ER) is a problem that arises in many information integration applications. ER process identifies duplicated records that refer to the same real-world entity, and derives composite information about the entity. The cost of the ER process is high. In this propose paper, input data is split according to the blocking variables. As no comparisons are conducted between different blocks, each block can be processed independently form all others. Blocks can contain different numbers of records which results in varying processing times. We propose an effective blocking for combining multiple entity resolution systems.
Page(s) : 126-136
Source : Vol. 2, No.4