Home > Business, management, Modern Data Management, Technology > Modern Data Management

Modern Data Management

September 9th, 2011 admin Leave a comment Go to comments

Modern data management is rapidly changing to accommodate the economic downturn and the growth of new technology. To reduce expenses, many IT shops are reusing legacy storage devices in addition to taking advantage of pay-as-you-go, cloud-based services. However, these distributed systems must be managed effectively to provide viable, affordable solutions to data management.

The Exciting Challenges of the New Infrastructure
This new strategy isn’t without challenges and opportunities. Today’s system designers must determine how to fully leverage the strengths of on-demand hardware to build the best data management platforms for their IT shop. At a minimum, these solutions must:
• Provide a high degree of scalability and a low level of latency by taking full advantage of parallel processing and memory capabilities.
• Provide fast and easy methods to expand and contract resources as demand changes.
• Provide exceptional up-time with minimal outages. The system should be designed to expect errors and recover accordingly without impacts to the end-user.
• Create a global experience spanning both time zones and geographical boundaries to unite business systems and partners.
• Support a variety of workload types including transactional, analytic, pull, and push.
• Increase effectiveness, efficiency, and affordability while promoting growth.

The CAP Theorem
A popular theory called the CAP Theorem states that it is not possible for a distributed storage system to be “consistent, available, and partition tolerant” at the same time. At any given point, only two of these goals are achievable. Because of this, tradeoffs must be made when distributed systems are designed and implemented.

The Eventually-Consistent Design Strategy
Some web developers are trading consistency for uptime when designing their applications. In anticipation of the need to partition the network as the system grows, they have relaxed consistency requirements in order to guarantee a higher degree of availability during and after the partitioning. This means that individual network outages could result in stale data or other minor problems instead of a nonfunctional website. These “eventually consistent platforms” were inspired by online icons like Google, Microsoft, and Amazon; many cloud-based services and open-source projects offer products that use this design structure.

A Different Approach: Enterprise Data Fabric
Although the eventually-consistent design is acceptable for many applications, it’s not a viable solution for any process where consistency is a key concern. For example, inconsistent processing in a financial system could spell disaster with multiple downstream impacts to data accuracy and consistency. There will always be some form of CAP tradeoff in a distributed system, but a new approach called EDF, or enterprise data fabric, promises to provide a better solution for core business functions.

EDF solutions use a shared-nothing approach to scalability. Partitioning uses nodes that are connected to create a seamless and expandable “fabric” that can span application, geographic, and machine boundaries. To scale the available storage space horizontally, EDF simply connects additional machine nodes. Within these data partitions, entries are composed of key/value pairs with an exceptional level of thread-based consistency.

By isolating data, related partitions can be organized and grouped into service entities. This larger unit is deployed on a single storage device where it can be accessed transactionally with complete independence from other service entities. This approach allows the EDF to create fault tolerance using a partial failure mode with fault isolation.

EDF-based systems exploit the variable nature of data by building flexible configurations that allow for consistency, partition-tolerance, and availability tradeoffs based on when and where the application workflow processes the information. When implemented correctly, EDF strategies allow businesses to reach all three CAP goals, but not at the same time or in the same place.

With the right approach, data management across a distributed system can be an effective and affordable solution for modern IT departments. Before choosing a strategy, consider the benefits and potential issues that each one brings to the table.

  1. No comments yet.
  1. No trackbacks yet.