Big Data and Analytics Archives | eWEEK

Veritas’s Matt Waxman on Data Protection Strategies

James Maguire — Thu, 19 Oct 2023 23:32:39 +0000

I spoke with Matt Waxman, SVP and GM, Data Protection at Veritas, about essential methods for protecting against cyberattacks.

As you survey the cybersecurity market, what’s the current biggest trend?
You’ve said that “It’s a matter of when, not if, a cyberattack slips past perimeter defenses, so they must have the strategies in place to respond to a successful breach quickly and effectively.” So what is that strategy, in a nutshell?
How is Veritas addressing the security needs of its clients? What’s the Veritas advantage?
You’ve also said that “Resilience is a team sport: No one vendor can solve an organization’s entire cyber resilience challenge.” How should companies evaluate complementary IT partners to provide end-to-end cyber resilience?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Veritas’s Matt Waxman on Data Protection Strategies appeared first on eWEEK.

Pure Storage’s Justin Emerson on Analytics Performance and Flash Storage

James Maguire — Wed, 11 Oct 2023 23:36:06 +0000

I spoke with Justin Emerson, Principal Product Manager and Technical Evangelist for Pure Storage, about how high speed flash storage enables better data analytics performance.

Among the topics we discussed:

What’s a major challenge that companies face with their data analytics practice? For all the effort and expense, what’s still holding companies back?
How can companies solve this issue? Is there industry momentum toward any solutions?
How does Pure Storage support the analytics practices of its clients?
What about ESG and the issue of data storage, with the shift from hard disc to flash? What are the implications for issues like power use and sustainability?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Pure Storage’s Justin Emerson on Analytics Performance and Flash Storage appeared first on eWEEK.

Senzing CEO Jeff Jonas on AI, Data Analytics and Entity Resolution

James Maguire — Wed, 04 Oct 2023 21:28:30 +0000

I spoke with Jeff Jonas, CEO of Senzing, about how to get more from an enterprise data analytics practice, the importance of entity resolution, and the role the AI in data mining.

Among the topics we discussed:

Data analytics with 2024 around the corner: do you see artificial intelligence as the savior of those companies that struggle with data analytics?
What advice do you give to companies to get more from their analytics practice?
How is Senzing addressing the analytics needs of its clients? Why does entity resolution matter?
The future of data analytics, AI and entity resolution? What do you predict?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Senzing CEO Jeff Jonas on AI, Data Analytics and Entity Resolution appeared first on eWEEK.

Vanguard Chief Data Analytics Officer Ryan Swann on Boosting Analytics Performance

James Maguire — Wed, 27 Sep 2023 18:35:05 +0000

I spoke with Ryan Swann, Chief Data Analytics Officer at Vanguard, who discussed the advantages of using a centralized, co-located data and analytics team.

Among the topics we discussed:

As you speak with other data analytics professionals, what are a couple of key trends and/or challenges you see with how companies are currently using data analytics?
You’ve spoken about the importance of a “centralized, co-located data and analytics team.” First, let’s define this: what does it mean to be both centralized and co-located?
How does this improve functionality for data analytics teams?
Does this help develop talent among the data analytics staff? Any other advantages?
Do you have a sense of future directions for data analytic practices in the corporate setting? What do you foresee?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Vanguard Chief Data Analytics Officer Ryan Swann on Boosting Analytics Performance appeared first on eWEEK.

Expert Panel on Data Mesh: Capital One Software and Pitney Bowes

James Maguire — Wed, 23 Aug 2023 18:40:00 +0000

I spoke with an expert panel about the advantages of data mesh, which is a decentralized data architecture that organizes data by specific business domain. Among other advantages, data mesh allows self-service data access across an organization – along with governance – which can enable significant competitive advantage.

The panelists:

Ana Matei, who leads the Customer Solutions Architecture space for Capital One Software.
Vishal Shah, Data Architect Manager at Pitney Bowes.

See transcript, podcast and video below.

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

This transcript has been edited for length and clarity.

eWeek: Why are companies moving toward data mesh architecture?

Ana Matei: I think there’s a lot to unpack here. I would like to start by setting the stage with what data mesh is at its core. In the most simple terms, it is an architectural concept, but also a paradigm shift that distributes the handling of data in an organization to the individual lines of business or domains or individual teams.

It is a new approach to managing and distributing data within large organizations. In that sense, data mesh departs from the traditional approach of centralizing data responsibilities under one large data team and instead allows companies to access and analyze data at scale.

So the core idea here is decentralizing data ownership and management across the organization by treating that data as a product.

Data mesh, in my opinion, has emerged as an important framework to help companies scale in a well-managed cloud data ecosystem, in a complex data environment in which volumes and sources of data are growing exponentially by the day.

There are four main principles that support the data mesh architecture that I would like to run through at a super high level:

Data as a product: So think of this as data teams applying product thinking to their data sets. In other words, an organization can assign a product owner to a data and apply the same rigorous product principles to data assets to provide real value to its end consumers.

These data products should be developed, versioned and managed as a software would be.

Data ownership: This translates into data ownership being federated among domain experts who are responsible for producing assets for analysis and business intelligence.

Self-service data platforms principle: This would essentially be a platform that would handle the underlying infrastructure needs of processing data while providing the tools necessary for a domain specific autonomy.

Federated computational governance principle of data mesh: This essentially translates into a universal set of centrally defined standards that ensure conformity to data quality, security policy compliance across all these different data domains and data owners.

eWeek: Three reasons why companies are moving toward a data mesh architecture.

Ana Matei: In no particular order, I would say scalability and agility, as the domain teams within this construct can respond quickly to changing business requirements without centralized data teams for every related task, which typically adds a lot of delays in bottlenecks.

Another main reason is reduced data silos, which has historically plagued teams in traditional architectures. So by promoting data as a product approach, each domain can take ownership of the data and make it available for consumption by other teams.

The third reason is the empowerment of domain experts to manage all of the products in their business units independently without having to rely on centralized data engineering teams.

Vishal Shah: Complimenting what Anna said, even on a very basic level for people to understand data mesh, a lot of people are moving to the cloud and normally the journey to the cloud is centralized – all the data is in one place so that people can use it.

But very soon they realize that centralizing the data in one place means that there’s one team which is managing all the data. So when those teams are contacted for various kinds of analytic solutions, operational solutions, there’s resource constraint.

That team alone cannot do everything. That team does not know how the data is being used. That team does not know what is the business that this data is going to be used for. So when data mesh came in, the whole purpose was to basically take that ownership from one central team doing all the data related activity and create a cross-functional team, which will consist of the actual data staff.

There’ll be a BI person, who is a business analyst, there would be a dashboarding person, who actually takes the data and creates dashboard. There’ll be data owners, so people who are actually generating the data.

And what this cross-function team will then do is they’ll come together and come up with a data set or data product, as Ana mentioned, to make it available for people to use.

And the whole beauty of that is that any person in the team now knows what the data is, where it is coming from, how it is being used, and what is the impact on it.

So there’s knowledge sharing within the team and there’s also responsibility, like ownership of the whole team instead of one person on one team owning it.

So at a very basic level, that’s what data mesh provides. It gives you the control of what people are consuming, the control on how you will publish something and how people consume it and what is the impact it’s going to create.

And the data governance portion that Ana mentioned: that assists this whole piece by ensuring that there’s full audit control on what is being generated and what has been published by documenting it. [It] sets up observability, sets up data quality metrics so that governance helps people to make sure that what you’re publishing is always audited basically, and becomes a trustworthy system.

eWeek: It seems like a big advantage of data mesh is the self-service aspect. Data mesh democratize access to the data and enables a company to move much faster.

Vishal Shah: Absolutely. That’s a very, very good point. It does help democratize the data because now when you make the data as a product…you are basically ensuring that the data which is being published is complete.

It’s fully trustworthy, it is documented and it has full audit control on it. If something goes wrong, people are automatically notified, there’s full automation. And then there’s also the self-serve piece of that data product, which makes this data product available to other people without even contacting someone – you can actually subscribe to the data product and start using it.

So that is the beauty of that self service. Definitely a big, big piece of the data mesh.

eWeek: Is data mesh an optimal solution for every company?

Ana Matei: Data mesh right now may not be the right answer for everyone. I think when considering data mesh as an approach, I would encourage companies to think about a few key aspects to make that determination.

They should understand how complex and how vast their data inventory is because data mesh is best suited for companies dealing with significant data complexity and scale. So I’m talking about organizations where the data landscape is growing rapidly, with diverse data sources, complex data models and multiple data consumers across the various domains as well as data producers.

Vishal Shah: I would like to add one more thing. It also is determined by the size of the company. If you have a very small engineering team supporting a few data sets, it does not make sense to actually create that whole architecture because you’re not going to get that kind of benefit. And then sometimes in a smaller company, having this separated out, as Ana was mentioning, people may not be always open to it because it is definitely a way to work differently than what you’re used to.

So an engineer told me that, okay, he was doing this engineering work only but now he’s told he’s no longer an engineer alone, he has to work with the analyst to create the aggregation. He has to work with the reporting process to make sure the reports come out. So he needs to know other things.

And that change, as Ana was mentioning, is very important. If people are open to that change, then it becomes easier to implement. And with a smaller company or even a startup, they don’t have that big a team to actually utilize the benefit. I’m not saying they cannot do it, it’s just that they may not get the same kind of benefit that data mesh provides.

eWeek: What are the common issues and challenges that companies encounter as they operationalize data mesh? How do you recommend addressing these challenges?

Vishal Shah: So as I said in my earlier conversation, basically you need to see first that is the data that you are publishing. You have lots of data that multiple teams want to use it. That is the first criteria you take: this data is going to be used by my five different teams and these are kind of solutions they’re building on top of it.

Once you know what that use case is on that data and what is the business impact of that data, that’s where you can decide whether that’s where you can decide how to create that domain centric data.

Now the initial challenges that they might face is basically, first of all, have a cross-functional team across different business unit. Because a lot of times in very large companies, different business unit don’t talk to each other as a small company.

Ana Matei: I’m going to choose the three key challenges that I think are most prevalent.

So one is around data governance and standardization. It’s also something we have learned through our journey, at Capital One, and I’ve heard other companies talk about it as well. Which is how to basically ensure that there is consistency across the different domains for how governance is established, it’s standardized and enforced across the board.

So to address this challenge, I would recommend companies ensure that they have a very well and clearly established governance framework that has been aligned by all the different stakeholders and teams involved through collaboration. And they also have a regular means of communication through governance committees or any other shared methodologies of communication or documentation.

Another challenge is monitoring and observability, right? So data flows have always been an issue across the board and it’s hard to build really robust data flows. And in a data mesh architecture, they can become amplified. If you have multiple data products and the domain teams are involved, which can lead to that complexity, it just keeps adding to the challenge.

So to address that, I think sometimes an investment is needed in a centralized monitoring solution that can provide that visibility and those insights into your data product usage, into your data pipeline performance, into your data quality system, health, you name it.

So I think this should help identify those bottlenecks and other challenges early enough to ensure smooth operations across the data mesh.

And finally, the data product lifecycle management. As the number of products in an organization grows, it can naturally be challenging to manage the versioning, deprecation, retirement, keeping up to date all of these different data products.

To address this challenge, I think companies would need to foster an ability for their data teams to operate in a simple fashion just like DevOps teams. I find that those types of principles also work very well with product management.

eWeek: What key takeaways do you want companies to be aware of about data mesh?

Vishal Shah: If companies have multiple teams using the data to create solutions using the same data and creating different solutions, it makes sense to use a data mesh architecture so that they have one way to look at the data.

Because what we saw, which gave us a big benefit, was people are looking at the data and creating a report. Two teams are looking at the same data and creating a report, but both were giving different result – same data, but different result.

Now, we cannot say any one of them is wrong because both of them are doing their own kind of logic on the data. So technically both of the reports are right, but to an executive they say, Okay, which number should I trust with the data mesh piece?

Because those teams are all coming together to generate that product, which is going to be eventually used on the dashboard. That product will then have a single result going out with all the information. So then it’s one single source of truth of everything. That data map allows you to get a single source of truth on your data, which is trustworthy.

Ana Matei: I think companies can take away three key things to focus on if they’re looking to implement data mesh.

First and foremost, they have to determine if data mesh is truly the right approach for their organization. And we talked about a few ways they can do that earlier in our conversation.

If they determine it’s the right approach for them, I would recommend to start early, because starting early does reduce some of the complexity and level of effort required to potentially retrofit existing products in this new paradigm, in this new concept.

And last, I would definitely recommend using a similar two-pronged approach as Capital One did, where they would build this central policy and central tooling that will then enable a federated data management. Because data mesh remains just a concept unless these organizations can provide self-service tools and automated workflows to operationalize this federated ownership of data.

The post Expert Panel on Data Mesh: Capital One Software and Pitney Bowes appeared first on eWEEK.

5 Mistakes to Avoid In a Data Storage Refresh

eWEEK EDITORS — Wed, 09 Aug 2023 17:59:46 +0000

As data storage technology has evolved with more choice and options for different use cases—the flavor of today is AI-ready storage—determining the right path for a data storage refresh requires a data-driven approach.

Decisions for new data storage must also factor in user and business needs across performance, availability and security. Forrester found that 83 percent of decision-makers are hampered in their ability to leverage data effectively due to challenges like outdated infrastructure, teams overwhelmed and drowning in data, and lack of effective data management across on-premises and cloud storage silos. Leveraging cloud storage and cloud computing, where AI and ML technologies are maturing fastest, is another prime consideration.

Given the unprecedented growth in unstructured data and the growing demand to harness this data for analytical insight and AI, the need to get it right has never been more essential. This article provides guidance on that topic by highlighting what not to do when performing a data storage refresh.

Also see: Top Cloud Companies

Mistake 1: Making Decisions without Holistic Data Visibility

When IT managers discover that they need more storage, it’s easy to simply buy more than they need. But this may lead to waste and/or the wrong storage technology later.

A majority (80%) of data is typically cold and not actively used within months of creation yet consumes expensive storage and backup resources. Plus, given that you can now purchase additional storage instantly and on-demand in the cloud and with storage-as-a-service on-premises, there’s no reason to overprovision.

To avoid this common conundrum, get insights on all your data across all storage environments. Understand data volumes, data growth rates, storage costs and how quickly data ages and becomes suitable for archives or a data lake for future data analytics.

These basic metrics can help guide more accurate decisions, especially when combined with a FinOps tool for cost modeling different options. The need to manage increasing volumes of unstructured data across multiple technologies and environments, for many different purposes, is leading to data-centric rather than storage-centric decision-making across IT infrastructure.

Mistake 2: Choosing One-Size-Fits-All Storage

Storage solutions come in many shapes and forms – from cloud object storage to all-Flash NAS, scale-out on-prem systems, SAN arrays and beyond. Each type of storage offers different tradeoffs when it comes to cost, performance and security.

As a result, different workloads are best supported by different types of storage. An on-prem app that processes sensitive data might be easier to secure using on-prem storage, for instance, while an app with highly unpredictable storage requirements might be better suited by cloud-based storage that can scale quickly.

This again points to the need to analyze, segment and understand your data. The ability to search across data assets for file types or metadata tags can identify data and better inform its management. Avoid the one-size-fits-all approach by provisioning multiple types of storage solutions that reflect your different needs.

Also, less than 25% of data costs are in storage: the bulk of the costs are in the ongoing backup, disaster recovery and protection of the data. So, consider the right storage type and tier as well as the appropriate data protection mechanisms through the lifecycle of data.

For more information, also see: Best Data Analytics Tools

Mistake 3: Becoming Locked into One Vendor

Acquiring all your storage from one vendor may be the simplest approach, but it’s almost never the most cost-effective or flexible.

You can likely build more cost-effective storage infrastructure if you select from the offerings of multiple vendors. Doing so also helps protect you against risks like a vendor’s decision to raise its prices substantially or to discontinue a storage product you depend on.

If you have other vendors in the mix, you can pivot more easily when unexpected changes occur. Using a data management solution that is independent of any storage technology is also a way to prevent vendor lock in, by ensuring that you can move data from platform to platform without the need to rehydrate it first.

Mistake 4: Moving Too Fast

A sense of urgency tends to accompany any major IT migration or update, storage refreshes included. Yet, while it’s good to move as efficiently as you can, it’s a mistake to move so fast that you don’t fully prepare for the major changes that accompany a storage refresh.

Instead, take time to collect the data you need to identify the greatest pain points in your current storage strategy and determine which changes to your storage solutions will deliver the greatest business benefits. Be sure, too, to collect the metrics you need to make informed decisions about how to improve your data management capabilities.

Mistake 5: Ignoring Future Storage Needs

You can’t predict the future, but you can prepare for it by anticipating which new requirements your storage solutions may need to support in the future. At present, trends like AI, sustainability and growing adoption of data services mean that the storage needs of the typical business today are likely to change in the coming year.

To train AI models, for example, you may need storage that can stream data more quickly than traditional solutions. Likewise, implementing data services in order to support FinOps goals might mean finding ways to consolidate and share storage solutions more efficiently across different business units.

Conclusion: The Importance of a Storage Refresh

As organizations move from storage-centric to data-centric management, IT and storage architects will need to change the way they evaluate and procure new storage technologies.

The ability to analyze data to make nuanced versus one-size-fits-all storage decisions will help IT organizations navigate many changes ahead – be they cloud, edge, AI or something else still on the horizon.

Read next: What is Data Visualization

About the author:

Krishna Subramanian is COO, President & Cofounder of Komprise.

Featured Partners: BI Software

Domo

Visit website

Domo puts data to work for everyone so they can multiply their impact on the business. Underpinned by a secure data foundation, our cloud-native data experience platform makes data visible and actionable with user-friendly dashboards and apps. Domo helps companies optimize critical business processes at scale and in record time to spark bold curiosity that powers exponential business results.

Learn more about Domo

Yellowfin

Visit website

Yellowfin provides a fully white-labeled, embedded analytics experience for enterprise software applications using a single integrated platform, which includes action-based dashboards, stunning data visualization, automated analysis and data storytelling. Its low code UI, together with automated alerts and AI-generated insight explanations help make it easier for your customers to realize true self-service business intelligence.

Learn more about Yellowfin

Wyn Enterprise

Visit website

Wyn Enterprise is a scalable embedded business intelligence platform without hidden costs. It provides BI reporting, interactive dashboards, alerts and notifications, localization, multitenancy, & white-labeling in any internal or commercial app. Built for self-service BI, Wyn offers limitless visual data exploration, creating a data-driven mindset for the everyday user. Wyn's scalable, server-based licensing model allows room for your business to grow without user fees or limits on data size.

Learn more about Wyn Enterprise

Zoho Analytics

Visit website

Finding it difficult to analyze your data which is present in various files, apps, and databases? Sweat no more. Create stunning data visualizations, and discover hidden insights, all within minutes. Visually analyze your data with cool looking reports and dashboards. Track your KPI metrics. Make your decisions based on hard data. Sign up free for Zoho Analytics.

Learn more about Zoho Analytics

Sigma

Visit website

Sigma delivers real-time insights, interactive dashboards, and reports, so you can make data-driven decisions on the fly. With Sigma's intuitive interface, you don't need to be a data expert to dive into your data. Our user-friendly interface empowers you to explore and visualize data effortlessly, no code or SQL required.

Learn more about Sigma

The post 5 Mistakes to Avoid In a Data Storage Refresh appeared first on eWEEK.

Privacera CEO Balaji Ganesan on Democratizing Data Access

James Maguire — Thu, 03 Aug 2023 18:04:42 +0000

I spoke with Balaji Ganesan, CEO of Privacera, about the unavoidable conflict between democratizing data and governing data; he offers tips for companies grappling with this issue.

Among the topics we covered:

As you survey how companies are handling data access, what are some common issues or challenges you see?
How can companies optimize their data access practices? That is, to fully democratize accessing data will still maintaining data security and governance?
How is Privacera addressing the data security needs of its clients?
The future of data access and data security? What are some key milestones we can expect in the years ahead?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Privacera CEO Balaji Ganesan on Democratizing Data Access appeared first on eWEEK.

eWEEK TweetChat, August 15: Next Generation Data Analytics

James Maguire — Tue, 01 Aug 2023 22:47:16 +0000

On Tuesday, August 15 at 11 AM PT, @eWEEKNews will host its monthly #eWEEKChat. The topic will be Next Generation Data Analytics, and it will be moderated by James Maguire, eWEEK’s Editor-in-Chief.

We’ll discuss – using Twitter – the evolving trends and strategies for using data analytics in pursuit of competitive advantage. Our ultimate goal: to offer guidance to companies about how to get the most from data analytics – now and going forward.

See below for:

Participant list for this month’s eWeek Tweetchat on data analytics
Questions we’ll discuss in this month’s eWeek Tweetchat
How to Participate in the Tweetchat
Tentative Schedule: Upcoming eWeek Tweetchats

Participants List: Next Generation Data Analytics

The list of experts for this month’s Tweetchat currently includes the following – please check back for additional expert guests:

Luis Flynn, Sr. Manager for AI and Analytics, SAS
Gur Steif, President, Digital Business Automation, BMC
Sunil Senan, SVP of Data & Analytics, Infosys
Scott Dykstra, CTO, Co-Founder, Space and Time
Anil Inamadar, VP of Global Services, Instaclustr
Chad Meley, CMO, Kinetica
Andi Mann, Global CTO and Founder, Sageable
Zeus Kerravala, Founder and Principal Analyst, ZK Research
James Maguire, Editor-in-Chief, eWeek [moderator]

Tweetchat Questions: Next Generation Data Analytics

The questions we’ll tweet about will include the following – check back for more/revised questions:

First, let’s look back: How would you describe the evolution of data analytics in the enterprise over the last few years? Do most companies have an effective strategy?
Here in 2023, to what extent is data analytics living up to its promise as the great competitive tool?
Clearly, enterprise data analytics strategy is constantly in flux. What’s driving it forward (or backward) in 2023?
What about the trend toward the “democratization of data” – easier access to data. Is it real? Is it working?
We know that AI and generative AI is revolutionizing enterprise tech. How do you expect it to change data analytics practice?
Looking ahead, what do you expect to be the toughest challenges facing the effective use of data analytics?
How do you recommend addressing this most difficult data analytics challenge?
What forward-looking Best Practices advice would you give to companies to grow their data analytics usage?
A last Big Thought about data analytics – what else should managers/buyers/providers know about preparing for the future of data analytics?

How to Participate in the Tweetchat

The chat begins promptly at 11 AM PT on August 15. To participate:

Open Twitter in your browser. You’ll use this browser to Tweet your replies to the moderator’s questions.

2. Open Twitter in a second browser. On the menu to the left, click on Explore. In the search box at the top, type in #eweekchat. This will open a column that displays all the questions and all the panelists’ replies.

Remember: you must manually include the hashtag #eweekchat for your replies to be seen by that day’s tweetchat panel of experts.

That’s it — you’re ready to go. Be ready at 11 AM PT on August 25 to participate in the tweetchat.

NOTE: There is sometimes a few seconds of delay between when you tweet and when your tweet shows up in the #eWeekchat column.

#eWEEKchat Tentative Schedule for 2023*

July 25: Optimizing Generative AI: Guide for Companies
August 15: Next Generation Data Analytics
September 12: AI in the Enterprise
October 17: Future of Cloud Computing
November 14: Edge Computing Trends
December 12: Tech in 2024: Predictions and Wild Guesses

*all topics subjects to change

The post eWEEK TweetChat, August 15: Next Generation Data Analytics appeared first on eWEEK.

Capital One Software’s Patrick Barch on Cloud Data Cost Optimization

James Maguire — Thu, 22 Jun 2023 21:03:30 +0000

I spoke with Patrick Barch, Sr. Director of Product Management at Capital One Software, about best practices for cloud data cost optimization; he also highlighted the benefits of Capital One Software’s data management solution, Slingshot.

See below for an edited transcript of our conversation.

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

An edited transcript of our conversation:

Why is cloud data cost optimization so crucial for companies these days?

Well, the cloud is really a story of more: more power, more flexibility, more speed of adoption; it puts more data into the hands of more people to make better business decisions.

But if you lean too heavily into that story of more, you risk spending way more than you intended, way faster than you thought. And so by putting the right governance, the right controls, the right practices in place, as you scale up your usage of the cloud, you can really achieve all of the benefits – the promise of the cloud. And do it in a way where you’re wringing every last ounce of efficiency out of every last dollar of spend.

What are some key best practices for cloud data cost optimization?

Because the cloud has really introduced the new model of paying for data, in the past, your data spend was a capital expense. You spend some time at each budget cycle deciding how much data capacity you needed for the following year and that was what you bought.

Today’s model is more pay-as-you-go. And so your bill could be a surprise at the end of the month. So the first thing that you really have to do is think differently about how you allocate your spend and the processes by which you engage your business teams.

At Capital One we call this a federated approach. We’ve engaged our teams, and what we do is we say, “Hey, business team, we don’t want our shared services group, we don’t want our central IT team to be a bottleneck, to stand in the way of you doing your job. The promise of the cloud is to do more faster. But we need you to adhere to some best practices.”

And so we built some centralized tools in a centralized platform with some centralized governance applied that really enable our teams to move at their own pace while ensuring that they’re doing it in a cost-controlled, effective, governed, well-managed way. So that’s number one: engage your teams.

Second, while you no longer have to manage the physical servers in a data center, you do have to right-size and optimize your infrastructure. There are very few reasons why a dev or a QA environment needs a large or a too-extra-large compute. Testing probably doesn’t require that kind of horsepower.

Make sure that you’re spinning up the right infrastructure for the job to be done at the moment. And this is where some of the policies that we’ve created internally come into play. We build all of those rules into our central platform so that our teams don’t have to keep all of that straight. But piece of advice number two is: make sure that you’re provisioning the right compute at the right time and at the right size.

And your teams are always going to bias toward getting their jobs done faster. Their first priority is not creating a well-managed data environment. Their first priority is driving more value for your business, serving their customers. And so it’s on you as a shared services team or a central IT group to enable them to do that at their own pace, but do it in a well-managed way.

The third thing is keep a close eye on your queries and your workloads that are running on your infrastructure. In the old days, when you had a fixed set of compute capacity on-prem, the consequences of a poorly written or bad query were really just taking up so much of that fixed capacity that other people also using the system noticed performance degradation.

In the cloud, that poorly written query can keep a warehouse up, it can keep a cluster up, it can make a cluster scale in ways that you don’t expect it to. That incurs more cost. And so now for the first time, bad queries, bad workloads mean unoptimized spend. And so look for ways to identify problem queries, identify potentially problematic users, and look for ways to optimize how they’re using the tools.

Let’s talk about Slingshot — the enterprise software created by Capital One Software. How does Slingshot help companies?

Capital One uses Slingshot to do all of the things that I just described. We’ve been using the tool internally for a number of years, and it really helps companies put in place the right kind of governance from the start so that you can create a well-managed Snowflake environment down the line.

And so we enable you to, through a single pane of glass, manage all of your Snowflake accounts – regardless of what region they’re in, what account they’re a part of, what business unit they’re under. You can apply a common set of standards to all of those accounts, and enable the users of those accounts to operate within a shared framework of governance.

Internally, we’ve seen a reduction in our overall Snowflake spend. We’ve seen improved efficiency as measured by cost per query. And we’ve really seen a reduction in the amount of manual effort that our shared services teams spend in one-off meetings with our business groups, provisioning infrastructure, answering questions, that normal back and forth that goes into provisioning new platforms.

The key is really making sure that if you’re spending more, you’re driving more business value, you’re not spending more because there’s wastage in the system. And so really rather than looking at this as a cost reduction play, you should be thinking about it as: how do you wring every last ounce of efficiency out of every dollar of spend?

News: What’s happening with Slingshot currently?

So this week we’re super excited to announce all of the new functionality that’s being released as part of the product. We spent the last year really working with our customer base, talking to prospects, getting feedback, learning how we can add more value. And that’s really taking shape in three major themes.

One is that the product is way easier to use and has way more customization and flexibility. Our dashboards are easier to navigate and drill into. The insights where we surface potentially problematic inefficiencies in your ecosystem are bubbling up to the top of the experience.

You can now allocate spend for all Snowflake cost drivers by whatever custom entity makes sense for your business, whether its line of business, whether its project, whether you want to charge back to your own customers. So just the ease of use, the flexibility – we’ve made a big push there.

The second big announcement we’re making this week is around increasing the breadth of our recommendation engine. Up till this point, our recommendations have really focused on helping you reduce spend and save money.

But there are times when you have a really important business process or a really important workload, and you want to spend a little bit more to make sure that things finish faster or within an SLA. And so rather than just telling you when you can save money, we’re also highlighting opportunities where you might want to spend a little bit more to give your users increased performance. And then we’re giving you all the information that you need to make the right decision for your business.

We’ve made it really easy to find and apply the highest value recommendations, whether that’s in terms of reducing cost or boosting performance. And so we’re excited about the additional value we can add there.

And then third, as I mentioned before, you really want to keep an eye on your queries. And so we’re launching the first version of what we call our query advisor. It’s really cool technology backed by a couple of patents. And what it enables you to do is take some of your more inefficient problematic queries, run them through some technology, and get recommendations for how those queries could be improved.

And so we have a long roadmap, we have a lot of runway left to go with how we optimize workloads. We’re excited to get the first version of this tool into customers’ hands, start getting some feedback, and really start driving some value.

What is it that really differentiates Slingshot in the market?

This is the only product that I know of that enables companies to implement that federated model of data ecosystem management that I mentioned before. We built it that way because we had some really aggressive deadlines for when we needed to move to the cloud. And so if you’re looking to operate in a way that the cloud demands, you need Slingshot.

Second, the breadth of our recommendation engine is something that I am particularly proud of. Not only do we look at where you can save money, but we’re getting smarter about where you may have an important process, where it might make sense for your business to actually go up because efficiency is all about running at the right performance for the right at cost.

And so we’re the only company, at least that I know of at the moment, that’s looking to help you strike that perfect balance and wring all of the efficiency out of your spend.

And then third, our query advisor tech is pretty cool. It’s generated a ton of savings internally. We’ve reduced our cost per query by about 43%.

There’s a core challenge that companies face with cloud data cost optimization: maximizing efficiency. And there’s a number of strategies for that. How do you see this issue?

One thing that we’ve learned from talking to lots of different customers and prospects is the word efficiency can have different meanings. So for some companies, efficiency just means running at the lowest possible cost at all times — user experience, forget about it.

For some companies, efficiency means make sure my really important jobs with really important SLAs finish by, let’s just say nine o’clock in the morning, come hell or high water. I don’t care how much I have to spend, these jobs have to finish on time because I get charged for some reason if they don’t. And I would say that maybe 40% of cases fall into those two groups.

The bulk is really about striking that perfect balance between user experience and cost optimization. So running with the best performance possible at the lowest possible cost. And that’s a tricky one because what does that mean? And I can’t answer that question. A product can help you get to that question, but really that’s about, how can you leverage our tool to help you strike that right balance for your company. Because really that’s a unique situation that’s unique to you.

To learn more:

CapitalOne.com/software

The post Capital One Software’s Patrick Barch on Cloud Data Cost Optimization appeared first on eWEEK.

Teradata CEO Steve McMillan on Data Analytics and the Cloud

James Maguire — Fri, 16 Jun 2023 21:10:08 +0000

I spoke with Steve McMillan, CEO of Teradata, about how enterprise users need to leverage the power of cloud computing to enable their data analytics practice to remain competitive.

Among the topics we discussed:

As you look at the data analytics sector, especially how it’s influenced by multi-cloud use, what are the key trends?
What advice do you give to companies to optimize their data analytics practice, especially as it relates to cloud?
How is Teradata addressing the data analytics needs of its clients? What’s the Teradata advantage?
The future of data analytics as it relates to the cloud landscape? What are your near or mid term predictions?

Listen to the podcast:

Also available on Apple Podcasts

Watch the video:

The post Teradata CEO Steve McMillan on Data Analytics and the Cloud appeared first on eWEEK.