Storage Developer Conference

#14: Instantly finding a Needle of data in a Haystack of large-scale NFS environment

Informações:

Sinopsis

Intel Design environment heavily depends on a large scale NFS infrastructure with 10s of PBs of data. Global Name space helps to navigate this large environment in a uniform way from 60,000 compute servers. But what if a user doesn't know where the piece of data he is looking for is located? Our customers used to spend hours waiting for recursive "grep" commands' completion - or preferred not to bother with some less critical queries. In this talk, we'll cover how Intel IT has identified an opportunity to provide a faster way to look for an information within this large-scale NFS environment. We'll review various open source solutions which were considered, and how we've decided to implement a mix of home-grown scalable NFS crawler with open source ElasticSearch engine to index parts of our NFS environment. As part of this talk we'll discuss various challenges and our ways to mitigate them, including: # Crawler scalability required to index large amounts of dynamically changing data within pre-defined