ALU

ML

DevOps in 2019 (Part 4)

Given the speed with which technology is changing, we thought it would be interesting to ask IT professionals to share their thoughts on their predictions for 2019. Here’s more of what they are thinking about DevOps:

Stefano Bellasio, CEO, Cloud Academy

Original Link

Exploring AWS Lambda Deployment Limits

In one of our last articles, we explored how we can deploy Machine Learning models using AWS Lambda. Deploying ML models with AWS Lambda is suitable for early-stage projects as there are certain limitations in using Lambda function. However, this is not a reason to worry if you need to utilize AWS Lambda to its full potential for your Machine Learning project. When working with Lambda functions its a constant worry about the size of deployment packages for a developer.

Let’s first have a look at the AWS Lambda deployment limits and address the 50 MB package size in the AWS official documentation which is kind of delusive as you can make larger deployments of uncompressed files.

Original Link

Need of the Hour: The Paradigm Shift in Technology

There comes a time when we need to shift from an existing level to the next to stay ahead in the race or save ourselves from becoming obsolete. This is known as a Paradigm Shift, switching from what you have to a new but feasible level.

In the world of technology, this shift is mandatory as, every day, we see a new, fast, error-proof and cost-effective technology. In this article, I would like to bring in some such areas where we are shifting rapidly. This list might grow and I urge you to keep listing them in the comments!

Original Link

Using AIOps for DevOps Workflows

Every DevOps support team has to deal with large amounts of monitoring data and logs in order to take care of their cloud infrastructure. AIOps is when AI is leveraged to make use of that data.

We have explained what AIOps is and the benefits it provides to any DevOps workflow. There are multiple DevOps tools used at various stages of software delivery, from iterations through code versioning, building, testing, pushing to production and monitoring the ready product performance. There are also various parameters to be taken into consideration while monitoring these software development lifecycle stages, from CPU/RAM usage to disk volume and bandwidth usage, to the numbers of app sessions, etc.

Original Link

QA for Machine Learning Models With the PDCA Cycle

The primary goal of establishing and implementing Quality Assurance (QA) practices for machine learning/data science projects or projects using machine learning models is to achieve consistent and sustained improvements in business processes, making use of underlying ML predictions. This is where the idea of the PDCA cycle (Plan-Do-Check-Act) is applied to establish a repeatable process ensuring that high-quality machine learning (ML)-based solutions are served to the clients in a consistent and sustained manner.

The following diagram represents the details:

Original Link

Extracting Text from Images: Google a Notch Better than Azure and AWS!

Extracting text from images has been worked on for many years now and finds applications in many domains like banking, legal, healthcare, education, and entertainment!

With the advent of machine learning, text extraction from images is being offered as a Cognitive API by many AI/ML providers like AWS Rekognition, Azure Computer Vision, and Google CloudVision.

While all three do a good job when it comes to default text detection, we used the Cognitive API Integrator tocompare the responses of these 3 major cognitive API providers on 3 parameters for the English language:

  • Different orientation
  • Different fonts
  • Reverse order text

While there are no clear winners here, Google does perform a notch better than Azure and AWS in the 3 parameters we compared them for.

Here is a brief summary:

  • Google does a great job at detecting vertical text irrespective of the top-down or bottom-up orientation
  • Google and Azure both give reverse order text (upside down text) a good shot, whereas AWS is never able to decipher it.
  • AWS does a great job detecting texts written in different fonts.
  • Azure needs handwritten mode on in order to detect different fonts.

Let’s take a look at a few examples.

Example 1: Vertical Text in Bottom-Up Orientation

verticaltext

  • AWS totally misses detecting the vertical text
  • Google and Azure are able to detect the text correctly.

Example 2: Vertical Text in Top Down Orientation

topdownverticaltext

  • Google gives the best result
  • AWS again gives it a miss
  • Azure is also unable to read vertical text in top-down orientation.

Example 3: Bottom-Up Text

bottomup

upsidedown

  • Clearly Google does the best job here
  • Azure gives it a try and AWS misses it completely

Example 4: Mixed Orientation

mixedorientation

mizedorientation2

  • While none of these three providers is able to hand mixed orientations correctly Google plays is safe and reads only one orientation but reads that correctly.
  • Azure tries to read all the orientations and reads one of the two orientations incorrectly.
  • AWS can only read the default orientation correctly.

Example 5: Mixed Fonts

mizedfonts

ImageToText_DifferentFont21527852824

  • While all providers detect different fonts AWS seems to be doing a better job than the other two!

Check out the Findings page for various similar conclusions drawn by the community while working with these APIs. Send us your findings and feedback at daksh@cennest.com.

About the Cognitive API Integrator

The Cognitive API Integrator aggregates cognitive services across major providers (currently Microsoft Azure, Amazon Web Services & Google Cloud). Use it to compare responses for various Cognitive APIs before making your selection of which provider you will integrate with.

Note: The Cognitive API Integrator does not aim to promote or downplay any Cognitive API Provider. Cognitive Analysis is a machine learning exercise where results are bound to improve with more data and usage. Conclusions drawn here can be subjective and users are encouraged to use the tool to form their own conclusions.

Original Link

Extracting Text from Images: Google a Notch Better than Azure and AWS!

Extracting text from images has been worked on for many years now and finds applications in many domains like banking, legal, healthcare, education, and entertainment!

With the advent of machine learning, text extraction from images is being offered as a Cognitive API by many AI/ML providers like AWS Rekognition, Azure Computer Vision, and Google CloudVision.

While all three do a good job when it comes to default text detection, we used the Cognitive API Integrator tocompare the responses of these 3 major cognitive API providers on 3 parameters for the English language:

  • Different orientation
  • Different fonts
  • Reverse order text

While there are no clear winners here, Google does perform a notch better than Azure and AWS in the 3 parameters we compared them for.

Here is a brief summary:

  • Google does a great job at detecting vertical text irrespective of the top-down or bottom-up orientation
  • Google and Azure both give reverse order text (upside down text) a good shot, whereas AWS is never able to decipher it.
  • AWS does a great job detecting texts written in different fonts.
  • Azure needs handwritten mode on in order to detect different fonts.

Let’s take a look at a few examples.

Example 1: Vertical Text in Bottom-Up Orientation

verticaltext

  • AWS totally misses detecting the vertical text
  • Google and Azure are able to detect the text correctly.

Example 2: Vertical Text in Top Down Orientation

topdownverticaltext

  • Google gives the best result
  • AWS again gives it a miss
  • Azure is also unable to read vertical text in top-down orientation.

Example 3: Bottom-Up Text

bottomup

upsidedown

  • Clearly Google does the best job here
  • Azure gives it a try and AWS misses it completely

Example 4: Mixed Orientation

mixedorientation

mizedorientation2

  • While none of these three providers is able to hand mixed orientations correctly Google plays is safe and reads only one orientation but reads that correctly.
  • Azure tries to read all the orientations and reads one of the two orientations incorrectly.
  • AWS can only read the default orientation correctly.

Example 5: Mixed Fonts

mizedfonts

ImageToText_DifferentFont21527852824

  • While all providers detect different fonts AWS seems to be doing a better job than the other two!

Check out the Findings page for various similar conclusions drawn by the community while working with these APIs. Send us your findings and feedback at daksh@cennest.com.

About the Cognitive API Integrator

The Cognitive API Integrator aggregates cognitive services across major providers (currently Microsoft Azure, Amazon Web Services & Google Cloud). Use it to compare responses for various Cognitive APIs before making your selection of which provider you will integrate with.

Note: The Cognitive API Integrator does not aim to promote or downplay any Cognitive API Provider. Cognitive Analysis is a machine learning exercise where results are bound to improve with more data and usage. Conclusions drawn here can be subjective and users are encouraged to use the tool to form their own conclusions.

Original Link

DevOps on AWS Radio: Big Data — Robert Murphy (Episode 15) [Podcast]

In this episode, Paul Duvall and Brian Jakovich cover recent DevOps on AWS news along with a discussion with Robert Murphy, who is a Senior DevOps Automation Engineer at Stelligent.

Here are the show notes:

DevOps on AWS News

Episode Topics

  1. Description of Big Data
  2. AWS resource considerations for Big Data
  3. Incorporating CI/CD into software systems using Big Data and Machine Learning
  4. Reducing deployment lead time from 30 hours to 34 minutes
  5. Description of different deployment pipelines: AMI provisioning, model training, and microservices
  6. Deployment pattern usage
  7. Using Twistlock Defender
  8. How they reduced deployment lead times to 34 minutes
  9. Lessons learned with CI/CD for Big Data and Machine Learning

About DevOps on AWS Radio

On DevOps on AWS Radio, we cover topics around applying DevOps principles and practices such as Continuous Delivery on the Amazon Web Services cloud. This is what we do at Stelligent for our customers. We’ll bring listeners into our roundtables and speak with engineers who’ve recently published on our blog and we’ll also be reaching out to the wider DevOps on AWS community to get their thoughts and insights.

The overall vision of this podcast is to describe how listeners can create a one-click (or “no click”) implementation of their software systems and infrastructure in the Amazon Web Services cloud so that teams can deliver software to users whenever there’s a business need to do so. The podcast will delve into the cultural, process, tooling, and organizational changes that can make this possible including:

  • Automation of
    • Networks (e.g. VPC)
    • Compute (EC2, Containers, Serverless, etc.)
    • Storage (e.g. S3, EBS, etc.)
    • Database and Data (RDS, DynamoDB, etc.)
  • Organizational and Team Structures and Practices
  • Team and Organization Communication and Collaboration
  • Cultural Indicators
  • Version control systems and processes
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
  • Application/service Architectures – e.g. Microservices
  • Automation of Build and deployment processes
  • Automation of testing and other verification approaches, tools and systems
  • Automation of security practices and approaches
  • Continuous Feedback systems
  • Many other Topics…

Original Link

The Future of Automated Testing

Easily enforce open source policies in real time and reduce MTTRs from six weeks to six seconds with the Sonatype Nexus Platform. See for yourself – Free Vulnerability Scanner. 

Automate open source governance at scale across the entire software supply chain with the Nexus Platform. Learn more.

Topics:

automated testing ,ai ,ml ,devops ,test automation

Original Link

The Future of Containers

Topics:

containers ,faas ,modernization ,serverless ,kubernetes ,cloud ,ai ,ml

Original Link

Where’s Big Data Going?

To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, “Where do you think the biggest opportunities are in the continued evolution of big data?” Here’s what they told us:

AI/ML

  • Cognitive and AI/ML accessible through public cloud providers are going to provide higher level services that add business value for clients. Use a vendor’s AI and apply to business needs. Nine months ago, Microsoft supported 35 languages. Today, they support 52. Cognitive services for speech recognition can now identify who is speaking. 
  • We are yet to realize the problems we’ll solve with big data in healthcare. Disease diagnosis with AI/ML will be able to detect patterns humans never could
  • More data provides more opportunities to use AI/ML to augment resources. Embrace those technologies to provide automated insights.
  • Static data goes away; data is always in motion. Files are less important as streams become more important. The infrastructure gets figured out. Storage and analytics are led by Google and Facebook’s work with AI/ML.
  • There are more cases of insights from large data to make decisions using AI/ML. There are also more use cases with insights from big data. 
  • Intelligence via ML. Smart homes become smarter. Cars get smarter. Start leveling business and personal lives by making lives easier.
  • Big data is here to stay. It’s the biggest technological revolution along with the internet and mega-computing. You can see the true value with Amazon’s and Netflix’ recommendation engines. There will be a natural evolution of AI/ML voice interfaces changing how people operate and interact with machines to reduce friction. This creates positive change in how we work with each other and how companies operate. Big data has come to represent a new and significant step in the evolution of computers; prior steps are represented by the invention of the automated computers in the 50’s, the design of computer communications and the internet in the 70’s, the commercial web in the 90’s, and the social media revolution in the 2000s. Automated systems can generate lifelike images and create written content that can be of quality compared to what an expert writer would generate, and we can interact with systems just using our voice (if you don’t believe me, ask Alexa). In short, big data represents a unique transformational opportunity for humanity, together with Artificial Intelligence (AI) and Machine Learning (ML).

Real-Time

  • We believe stream processing is the next big thing in big data. Businesses can no longer compete in today’s environment if they are waiting to receive daily, weekly, or monthly “reports” on the health of their business. It’s an untenable situation, and the companies who are winning, and will continue to win, are the data-savvy companies like Netflix, Alibaba, and Uber. They understand instantly what is happening with their business and how to react to a changing reality. With stream processing, data is processed instantly, which means businesses can react to changing dynamics and new situations in the moment to detect fraud, spot supply chain issues before they impact the customer (and bottom line), provide more personalized service to customers to keep them happy and build loyalty, and so forth. The impact of this can’t be understated. 
  • Big data initiatives must be driven by business outcomes. Cloud should be used more to optimize IT spending and above all allow the company to focus on business problems to solve. It still takes time today to analyze the data and get actionable insights. The big data evolution will be on increased real-time views while data protection and privacy by design are fully integrated. 
  • I will start with characterizing Big Data Analytics to be the processing of the maximum possible amount and types of data in the time allotted where decisions are made. Given that characterization, the biggest opportunities will be:
    • 1) Continually exploiting hardware advances to make better decisions faster using even more data in the allotted time. Examples include a) Persistent Memory (3D XPoint/Optane/HPE Persistent Memory), b) Multicore CPUs with hardware transactions and SIMD instructions, c) Many Integrated Core (MIC) and general accelerated computing.  Think Intel Xeon Phi, NVIDIA GPUs, and FPGAs. This cannot be understated in my opinion.  Too many software solutions don’t truly exploit concurrency and parallelism available in modern hardware. And finally, d) Exploiting faster connectivity and interconnectivity options for locally on servers and connected servers.
    • 2) Leveraging Big Data Analytics for predicting outcomes or behavior in order to increase beneficial opportunities or mitigate risk.
    • 3) Skilled individuals with both broad technical capabilities to achieve the aforementioned. 
  • Greater speed and reusability of data management processes with greater trust in the data. Less manpower required to manage big data.

Integration

  • Data is continuing and will continue to grow. Everything will be integrated providing sets of data that can solve specific business problems. 
  • Business’ ability to unify analysis and operations while adding intelligence. 90% of success with ML is data management. Developers – data fabric exposes interfaces so they can move containers anywhere and access as local. Microservices work the same way. Publish and subscribe are part of the same fabric. 
  • We’re at a tipping point with the performance and architecture perspective. More real-time, automated intelligence and analysis will be in applications. There’s an opportunity to innovate more and faster by converging microservices into big data. Decentralized analytics and transactions together in one platform. More data-driven microservices. 
  • Making it easy to unify data no matter where it may be stored and run analytics in real-time at memory speeds with any analytics framework.

Other

  • It becomes more pervasive so companies can continue to become more responsive to customers in real time. Next generation apps will continue to scale even faster.
  • We want everything to be self-service at work and in our personal lives. Bringing self-service to a more sophisticated user. Respecting the security controls of the business.
  • Big data is just becoming data. There will be no differentiation. 1) Strata Hadoop is now just Strata Data. According to Gartner, Hadoop is obsolete. Big data technology is changing rapidly. Now integrated into enterprise-grade solutions. Data lake technology will evolve to store and analyze later. 2) Data management becomes more important – governance, management, individual distributed processing while storing the data everywhere across a diverse landscape.
  • Big data is just getting bigger and faster. If you’re not already involved in big data, you’re late and in jeopardy of being passed by your competitors. Data is the number one business driver in every industry. Investment in data management technology will increase over time. ML, DL need GPU databases to solve problems.

Here’s who we spoke to:

  • Emma McGrattan, S.V.P. of Engineering, Actian
  • Neena Pemmaraju, VP, Products, Alluxio Inc.
  • Tibi Popp, Co-founder and CTO, Archive360
  • Laura Pressman, Marketing Manager, Automated Insights
  • Sébastien Vugier, SVP, Ecosystem Engagement & Vertical Solutions, Axway
  • Kostas Tzoumas, Co-founder and CEO, Data Artisans
  • Shehan Akmeemana, CTO, Data Dynamics
  • Peter Smails, V.P. of Marketing and Business Development, Datos IO
  • Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
  • Ali Hodroj, Vice President Products and Strategy, GigaSpaces
  • Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
  • Fangjin Yang, Co-founder and CEO, Imply
  • Murthy Mathiprakasam, Director of Product Marketing, Informatica
  • Iran Hutchinson, Product Manager & Big Data Analytics Software/Systems Architect, InterSystems
  • Dipti Borkar, V.P. of Products, Kinetica
  • Adnan Mahmud, Founder and CEO, LiveStories
  • Jack Norris, S.V.P. Data and Applications, MapR
  • Derek Smith, Co-founder and CEO, Naveego
  • Ken Tsai, Global V.P., Head of Cloud Platform and Data Management, SAP
  • Clarke Patterson, Head of Product Marketing, StreamSets
  • Seeta Somagani, Solutions Architect, VoltDB

Original Link

Inside Flipkart’s monster-cruncher: how it gleans insights from a petabyte of data daily

Flipkart CEO Kalyan Krishnamurthy with his team. Photo credit: Flipkart Stories

Gender is complicated for the data scientists at Flipkart, India’s leading ecommerce site. It’s not enough, for example, to know that the shopper is female. What if she’s shopping for her husband today? So, apart from the label she chose while signing up for the account, there is also a behavioral gender.

This is gender based on your behavior, whether you tend to shop like a female or a male when using your account. The behavior can also vary from session to session, so there’s a third in-session gender.

“We have to compute the label gender when we send you survey results so that we can address you properly and all. But when we have to show you what you want, we will piggyback on the behavioral gender and [adjust that] as soon as we know a little bit about this session [and] what your mood is today,” says Sandeep Kohli, Flipkart’s senior director of engineering.

All that work is just for one user. To see the scale at which Flipkart is doing this, consider the following numbers:

Data credit: Flipkart; chart by Tech in Asia

And gender is just one parameter for customer insights. All kinds of behavioral, demographic, and usage data go into making product recommendations or trying to ensure a customer doesn’t drop out without completing an intended purchase.

During the recent Big Billion Day sales, for example, pageviews in a day rose five-fold to nearly a billion, says Kohli. “You have to accommodate that [spike] without degrading the experience because people don’t have too much patience on such days. So every second of delay costs us a few hundred customers who could drop off at that point.”

Survival of the smartest

To manage that kind of data and do advanced analytics and personalization, Flipkart has built its own data center – the only Indian internet company to have done so. And that takes a lot of hardware. “We have 5 petabytes in RAM, 120 petabytes of disk storage, and [a] tremendous amount of cross-sectional bandwidth, because anything can become a bottleneck at that kind of scale,” says Kohli. “It is extremely important to know about your customers and serve them better, because at this point in the highly competitive ecommerce industry, it’s a survival game.”

If you record HD video 24/7 for 3.4 years, you would reach a petabyte.

Every action on the site involves analytics. Take for example a simple search query: how far did the user have to scroll down? If the search results are relevant, the user should be able to find the product they’re looking for without needing to scroll too far down. Now imagine getting the search right for 100 million users.

Efficiency also plays a vital part in customer experience. Where a product order comes from can help with inventory planning, delivery time, and so on. Then there are insights that are important for running the business, such as figuring out if product ratings are genuine or whether somebody is gaming the system for sales offers.

Flipkart gets 10 terabytes of user data each day from browsing, searching, buying or not buying, as well as behavior and location. This jumps to 50 terabytes on Big Billion Day sales days. There’s also order data, shipping data, and other forms of data captured by different systems. All this is mixed together and correlated for meaningful insights. “We actually process more than a petabyte of data every day in order to make sense of what is happening at our scale,” says Kohli.

A petabyte, by the way, is one thousand terabytes. And a terabyte is a million megabytes. If you record HD video 24/7 for 3.4 years, you would reach a petabyte.

Flipsters – as Flipkart calls its team – on a break just before the countdown to the Big Billion sale on September 20, 2017. Photo credit: Flipkart Stories

Following the breadcrumbs

Flipkart has over 60 machine learning models running on any given day to generate insights for its sales and business teams. These insights are served on over 6,000 real-time terminals that help business leaders make decisions.

How a sale is going, which deals are working or not working, at which point users are dropping off, what the real-time funnel is – the next time you go shopping online, think of all the footprints you’re leaving for data scientists to figure you out.

According to Kohli, the popular perception of a data scientist’s job is that it’s jazzy and romantic. “I think it’s not,” he says. “Most times, the data scientists in an organization are an angry lot, because either they’re not able to find the right data or the data that has been thrown [at] them is not of great quality, and they’re unable to achieve what they want.” Kohli is an Indian Institute of Science post-grad who worked earlier at IBM.

Typically, 80 percent of a data scientist’s job goes to cleaning up the data and other mundane stuff rather than modelling or analytics, adds Kohli. So his focus is on making the data scientist’s job more efficient. The aim is to ensure that the quality of data going into decision-making is up to the mark.

“When we decided to build our own data platform to serve our AI and ML and analytics needs, the first thing we decided was that it will be a typed data system; it will not be some kind of a big data system where you can dump in any data. That is like having a big hard disk with no file system and throwing everything in the root directory,” explains Kohli.

Ganapathy Poojary, who delivered 342 orders for Flipkart in a single day. Photo credit: Flipkart Stories

Doing it at scale

Another aspect is the sheer scale. Kohli likes to separate the data science part of what his team does from the engineering requirements. “Scaling is a pure engineering job and we do not want data scientists to be spending their time trying to figure out load balancers, fault tolerance, and other things. So we built a machine learning platform to actually let data scientists automatically deploy the model. The platform takes care of scaling these models.”

This underlying layer is what enables Flipkart to ingest a petabyte of data and digest it. It’s designed to ensure that the right data is captured and the analytics on it works at scale, even when 13 million users land on the Flipkart site daily. Apart from its own private cloud, Flipkart has also teamed up with Microsoft for its AI-powered Azure public cloud.

“Combining Microsoft’s cloud platform and AI capabilities with Flipkart’s existing services and data assets will enable Flipkart to deliver new customer experiences,” Microsoft CEO Satya Nadella said earlier this year when the deal was announced.

If everything is well in the system, what the user gets are relevant search results, product recommendations, and even ads. Checkouts are easier, inventory is managed better, delivery is more efficient, and marketing is more targeted. It all comes from how the data is handled.

As online shoppers in India become more experienced, their expectations also grow. They have less patience with not finding what they need, being stuck, or not getting delivery when and where they want. And they’re spoilt for choice between Flipkart, Amazon, and Alibaba-backed Paytm.

See: No robots, please, we’re Indian – the lowdown on Amazon’s localization strategy

Flipkart raised US$4 billion this year from SoftBank, Tencent, and Microsoft. Amazon is on track to surpass the US$5 billion Jeff Bezos pledged to its Indian unit last year. Discount wars are back after a lull in 2016. But if customer experience is the ultimate decider, it may boil down to those petabytes in the war of data going forward into 2018.

Original Link