Cross-sectional data Cross-sectional data are data on one or more variables collected at a single point in time. For example, the data might be on: ● A poll of usage of Internet stockbroking services ● A cross-section of stock returns on the New York Stock Exchange (NYSE) ● A sample of bond credit ratings for UK banks. Problems that could be tackled using cross-sectional data: ● The relationship between company size and the return to investing in its shares ● The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt.
Big data defined As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the three Vs of big data: volume, velocity and variety1. Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data. Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.