Gleaning Information from Multiple Haystacks: Unstructured Data Analysis
Written By: An IntelliGenesis Senior All-Source Intelligence Analyst This is what most of our days consist of– large unruly sets of non-sense that we must make sense. Or that used to be the case, more than likely nowadays we are looking at multiple haystacks of data. Some of those will be kind to us and have some form of structure, most will not. It is said that by the year 2020, the amount of digital data will breach 40 zettabytes. We all know some form of structured data–from banking records and medical records to inventory stock lists. The data within each cell for instance will be formatted in a single fashion (think a UPC code or serial number) and it can only ever be that format. Unstructured data on the other hand is structured data’s unruly cousin Eddy. It will contain some bit of information that you need, but it will not be in a well-defined field nor will it be consistently formatted throughout. Now imagine taking the structured data and trying to link it via one of its edges to the table of cousin Eddy data. You would end up just as crazy as Clark.