BITSEM

NK10 - Resource: Data

Which approaches address the different challenges of big data?

Hadoop is not a type of database, but rather a software ecosystem that allows for massively parallel computing. It is an enabler of certain types NoSQL distributed databases (such as HBase), which can allow for data to be spread across thousands of servers with little reduction in performance. The efficiency of NoSQL can be achieved because unlike relational databases that are highly structured, NoSQL databases are unstructured in nature, trading off stringent consistency requirements for speed and agility. NoSQL centers around the concept of distributed databases, where unstructured data may be stored across multiple processing nodes, and often across multiple servers. This distributed architecture allows NoSQL databases to be horizontally scalable; as data continues to explode, just add more hardware to keep up, with no slowdown in performance. The NoSQL distributed database infrastructure has been the solution to handling some of the biggest data warehouses on the planet – i.e. the likes of Google, Amazon, and the CIA.
 
Big data streaming is a process in which large streams of real-time data are processed with the sole aim of extracting insights and useful trends out of it. A continuous stream of unstructured data is sent for analysis into memory before storing it onto disk. This happens across a cluster of servers. Speed matters the most in big data streaming. The value of data, if not processed quickly, decreases with time.

Real-time streaming data analysis is a single-pass analysis. Analysts cannot choose to reanalyze the data once it is streamed.

Bei einer In-Memory-Datenbank handelt es sich um ein Datenbankmanagementsystem, das seine Daten nicht auf herkömmlichen Festplattenspeichern ablegt, sondern direkt den Arbeitsspeicher (RAM) hierfür nutzt. Dadurch lassen sich wesentlich höhere Zugriffsgeschwindigkeiten realisieren

Diskussion