For data-obsessed technologists and business professionals, Strata is the conference to attend. AI and machine learning have taken quantum leaps in the last few years. It is no surprise that this year’s Strata sessions have a specific focus on these.
More organizations are collecting data about everything today than they have ever done up until the last decade. However, collecting data and making sense of it to make business decisions are two entirely different agendas.
Hence, this year’s Strata aims to focus on sessions that aim to solve specific data challenges. These range from architecting an enterprise big data analytics platform to more hands-on sessions such as deploying machine learning models for production use.
With an exhaustive agenda that spans across three days, here are 13 sessions that I’m looking forward to.
Hosted by two industry veterans, this session looks at the design principles for creating a multi-use data infrastructure. More importantly, it focusses on data infrastructure that is forward-looking and is not held back by past constraints. Most organizations acquire technology piecewise and mostly ignore the broader IT landscape within which this technology resides.
Meant for architects, system designers, and managers, this session will walk attendees through a reference multi-use data and analytics architecture.
Why this session interests me: For a real competitive advantage, enterprises need to adopt a robust big data analytics system. As a consultant, I advise my clients on how to setup up their infrastructure for the production use of analytics. Hence, this session would help me learn the fundamental design principles to do that. The reference architecture walkthrough will be particularly useful.
Split across two parts, the first part of this strata session aims to educate attendees on successfully running a data analytics pipeline in the cloud and its considerations. In the second part, the presenters will coach a live session. Here attendees will get hands-on and successfully set up a data analytics pipeline in the cloud that integrates with data engineering and warehousing workflows.
Why this session interests me: Cloud infrastructure, especially in the serverless paradigm, allows for efficient use of resources to achieve analytics and other business goals. However, with the advent of large data volumes, the movement of data from one location to another is a significant challenge. It is not always possible to take computing to where data exists. As a result, I am hoping this session will impart expert opinion on some of the solutions that help solve these challenges.
Data scientists understand all too well the hassles of deploying their ML models into production. These challenges include incompatible languages and environments, erroneous processes, and poor communication.
Targeted at data scientists, DevOps professionals, and machine learning engineers, this session aims to help understand common deployment architectures for machine learning models in the real world.
Why this session interests me: I need to advise clients on how to set up production infrastructure for the models they have built. We have multiple engagements where we are helping our clients move from an in-house developed model to a production pipeline. This deployment is not a straightforward exercise as there are several complications. I’m hoping this session will help me glean insights on how to accelerate deployments of these machine learning models.
AI and ML-powered applications are growing in increasingly large numbers. With this growth, organizations often wonder about the deployment options for these applications. Large enterprises lean towards multi-cloud hybrid platforms. However, smaller companies have to decide whether to go with a full or hybrid cloud environment.
This session talks about some of these challenges and its possible solutions.
Why this session interests me: Continuing on the topic of deploying AI/ML-based applications, not all deployment solutions are the same, and it is dangerous to lock-in to a cloud vendor that lacks the right amount of required features. Hence, this session would help me get a proper perspective on the nuances involved in this decision making from an experienced presenter.
Big data is everywhere in the world’s largest ride-hailing company. However, moving this data between sources is expensive and time-consuming. Hence, analysts and engineers at Uber have to run SQL queries in real-time on all data sources.
This Strata session elaborates Uber’s engineering efforts in running live SQL queries in real-time on all data sources without any data copy using Hadoop, Spark, and Presto.
Why this session interests me: Machine learning has evolved beyond the basic MapReduce paradigm to a SQL paradigm. However, Not all SQL adapters for big data sources are the same, and there is no clear winner, for now. However, these are good learnings.
Originally developed at Airbnb, Apache Superset is an enterprise-ready business intelligence web application. It can be used to craft beautiful dashboards with the ability to slice and dice data.
In this session, the presenter offers an overview of Superset. He is scheduled to discuss its underlying technology, open source development dynamics, and the essential items on its roadmap.
Why this session interests me: The end goal of any analytics initiative, is to visualize data. We now have a solution specifically designed for this. Some of our projects are only to build such visualizations. I’d keep an eye on Superset and its development.
Production machine learning models require constant monitoring to ensure that its inferences are correct based on feedback and labels. However, labels are not available all the time or are too expensive and resource-intensive to obtain.
This session discusses the design and implementation of a real-time automated system to monitor production machine, learning models. The idea is to detect system anomalies such as spurious false positives, gradual drifts, and the inability of the model to grasp the target concept.
Why this session interests me: Loss of quality of machine learning recommendations over time and bias creep, and model poisoning are significant challenges to the production ML models. This real-time automated monitoring system would ensure that the model can be fine-tuned more rapidly and have the ability to stay on track with its inferences.
Connected devices today have computational and storage capabilities. Using federated learning, these connected devices can now securely learn ML models while retaining all data locally. Federated learning, puts the user back in control of the data while letting developers build intelligent applications using that data. It also provides a path for personalization at scale while reducing costs and risks associated with sensitive data handling.
This Strata session aims to introduce federated learning, how it differs from the traditional centralized approach and the use cases it fits best. The session will also see the presenter demonstrate a federated learning model in TensorFlow.
Why this session interests me: Federated machine learning takes machine learning to the data source while traditional machine learning algorithms work only with aggregated data. This is a new learning and hence useful.
In the past, stream processing frameworks mostly provided Java or Scala based APIs. Today however, stream processing is rapidly gaining enterprise adoption, and now stream processing can be done using SQL. This SQL-based processing makes stream processing accessible to non-programmers enabling them to carry out everyday tasks.
Why this session interests me: Similar to the session on bringing SQL into everything, machine learning has evolved beyond the basic MapReduce paradigm to a SQL paradigm. Not all SQL adapters into big data sources are the same, and there is no clear winner for now. Apache Flink has been around for a while, and it is evolving quickly. This is intriguing at the very least.
The number of data tools has risen exponentially in recent years. While powerful, they have not been able to interconnect, especially, when they have legacy interfaces such as ODBC, JDBC and REST. Apache Arrow attempts to solve this problem by enabling these systems to interchange common representations of data through in- and near-process communications.
However, for more complex and distributed topologies, Arrow Flight allows communication of data streams in large parallel streams. Using a high-performance protocol and a set of libraries, Arrow Flight lets engineering professionals quickly build data services that can move data between systems very quickly.
This Strata session gives a detailed walkthrough of Arrow Flight including its components, performance benchmarks, and live use cases. It also aims to showcase its opportunities for growth and its role in the concept of data micro-services.
Why this session interests me: Not all SQL adapters into big data sources are the same, and there is no clear winner for now. Apache Flink has been around for a while, and it is evolving quickly. That is intriguing at the very least.
Bullet is an open source query system that can query any streaming data set without storing this data. It is forward-looking and can run queries on an unbounded data set. In this Strata session, the presenters will talk about the innovative architecture of Bullet. It also includes a live demo on a real, high-volume data set. The Strata session also aims to solve common implementation challenges and how Sketches fit into it.
Why this session interests me: This is new learning. Querying data streams in real time has always been a source of grief for developers – until now. This one aims to filter, project, and aggregate data in transit which is very interesting.
Data migrations are hardly ever painless and smooth. Especially when downtime is not an option, and the database is a 200-node cluster with hundreds of terabytes of data. In this session, the presenters discuss an active/passive solution which makes this migration possible using an extensible database client.
Aimed at DevOps and engineers, this session presents learnings from such a migration.
Why this session interests me: Ever since companies have started collecting data from all possible sources, data storage requirements and solutions have kept evolving. Cost and performance remain a significant factor in such migrations. It will be great to understand the learnings and any novel solutions for a typical data migration.
The number of ML models that make business decisions is proliferating. However, it is hard to tell why a model has made a particular decision, if there is a bias in it, or if the dataset is being slowly poisoned.
This Strata session talks about these challenges and more, in scaling machine learning models.
Why this session interests me: Machine learning models are being increasingly used to influence and in some cases make decisions that impact human lives, apart from profit. In this regard, it is necessary to maintain a strict practice of monitoring and tuning machine learning models that are running in a production environment. This talk promises to be exciting, as experienced people give their opinions on the subject. If you found this post interesting or useful, here is our most recent content on similar topics:
“Synerzip team is very responsive & quick to adopt new technologies. Team naturally follows best practices, does peer reviews and delivers quality output, thus exceeding client expectations.”
“Synerzip’s agile processes & daily scrums were very valuable, made communication & time zone issues work out successfully.”
“Synerzip’s flexible and responsible team grew to be an extension to the StepOne team. Typical concerns of time zone issues did not exist with Synerzip team.”
“Synerzip worked in perfect textbook Agile fashion – releasing working demos every two weeks. Though aggressive schedules, Synerzip was able to deliver a working product in 90 days, which helped Zimbra stand by their commitment to their customers.”
“Outstanding product delivery and exceptional project management, comes from DNA of Synerzip.”
“Studer product has practically taken a 180% turn from what it was, before Synerzip came in. Synerzip cost is very reasonable as compared to the work they do.”
“Synerzip makes the timezone differences work FOR the customer, enabling a positive experience for us. ‘Seeing is believing’, so we decided to give it a shot and the project was very successful.”
“The Synerzip team seamlessly integrates with our team. We started seeing results within the first sprint. And due to the team’s responsiveness, we were able to get our product to the sales cycle within 7 months.”
“Product management team from Synerzip is exceptional and has a clear understanding of Studer’s needs. Synerzip team gives consistent performance and never misses a deadline.”
“Synerzip is different because of the quality of their leadership, efficient team and clearly set methodologies. Studer gets high level of confidence from Synerzip along with significant cost advantage of almost 50%”
“Synerzip’s hiring approach and practices are worth applauding. Working with Synerzip is like
“What you see is what you get”.”
“Synerzip has dedicated experts for every area. Synerzip helped Tangoe save a lot of cost, still giving a very high quality product.”
“Synerzip gives tremendous cost advantage in terms of hiring and growing the team to be productive verses a readymade team. Synerzip is one company that delivers “co –development” to the core!”
“Synerzip is a great company to work with. Good leadership and a warm, welcoming attitude of the team are additional plus points.”
“Our relationship with Synerzip is very collaborative, and they are our true partners as our values match with theirs.”
“Synerzip has proven to be a great software product co-development partner. It is a leader because of its great culture, its history, and its employee retention policies. ExamSoft’s clients are happy with the product, and that’s how ExamSoft measures that all is going well.”
“They possess a great technical acumen with a burning desire to solve problems. The team always takes the initiative and ownership in all the processes they follow. Synerzip has played a vital role in our scaling up and was a perfect partner in cost, efficiency, and schedules.”
“As we are a startup, things change on a weekly basis, but Synerzip team has been flexible in adapting the same”
“Synerzip team has been very proactive in building the best quality software, bringing in best practices, and cutting edge innovation for our company.”
“We’ve been working for more than six years with Synerzip and its one of the better, if not the best, experience I’ve had working with an outsourcing company.”
“My experience with Synerzip is that they have the talent. You throw a problem at them, and someone from that team helps to solve the issue.”
“The breadth and depth of technical abilities that Synerzip brings on the table and the UX work done by them for this project exceeded my expectations!”
“Synerzip UX designers very closely represent their counterparts in the US in terms of their practice, how they tackle problems, and how they evangelize the value of UX.”
“Synerzip team understood the requirements well and documented them to make sure they understood them rightly.”
“Synerzip is definitely not a typical offshore company. Synerzip team is incredibly communicative, agile, and delivers on its commitments.”
“Working with Synerzip helped us accelerate our roadmap in ways we never thought possible!”
“While working with Synerzip, I get a feeling of working with a huge community of resources, who can jump in with the skills as needed.”