In this blog post, Alexander Cramer explains how we incorporate the core principles of agile development in data science projects using dual-track agile.
He is a senior product manager within the data domain field who joined sennder in May 2020 and works with machine learning and data science teams to build tools and solutions around pricing and network optimization.
Agile is a common mindset in the tech industry for software projects. At the same time, its applicability to data science projects is still actively discussed. At sennder, we are utilizing the core principles of agile development while leaving room for the special needs of our data science projects.
What is agile development?
Agile development as an approach to building software is already approximately 20 years old. But its influence on how the majority of development teams are doing their work today is still growing every day.
The basics of agile development were famously envisioned in Snowbird, Utah, and are standing to this day. Its values written in the agile manifesto are:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
The main goal of agile development is to cooperatively work as a team consisting of developers and stakeholders to produce valuable, high-quality software for the end-user while being able to respond to change.
Data science development vs. traditional software development
The differences between data science and traditional software development are what makes a direct application of agile to the former sometimes a bit more challenging. Most software projects are sharing the same general phases of development:
- An initial investigation phase
- A development phase
- A release
- A subsequent support phase
Obviously, several substeps are included and the whole process is iterated more than once during the lifespan of a software solution.
While data science projects share the same general phases, the form of each phase differs significantly. One of the main reasons for this is, that in classic software development you build a machine, but in machine learning projects you build a machine that then builds a machine. This completely changes the requirements and potential pitfalls.
Differences between traditional and data science development by phase:
The agile manifesto in data science projects at sennder
While one group proclaims that agile’s focus on delivering iterations in an agreed-upon time span does not fit with the research-related uncertainty of data science projects, others believe that the focus on users and quicker iterations can benefit the development process within data science teams. So what are we doing to get the best of both worlds?
“Customer collaboration” and “individuals and interactions”
A normal data science project at sennder starts with someone on the business side identifying a problem the data science team could help solve. What follows are focus groups with the whole development team to make sure that all needs and problems are properly captured, including all involved software developers and product managers. In cooperation with the interviewed user group, a first basic prototype is then specced, which can be delivered quickly, sometimes within a few days, to test the first hypotheses. Here we’ve had good results with fast iterations using Google sheets as a basic user interface hooked up to some back-end code or by using frameworks like Streamlit for quick prototyping without leaving Python, the preferred language of our machine learning and analytics teams.
The focus is to be able to deliver something quickly while also having a platform that can be used for ongoing testing of different underlying models.
“Working software” and “responding to change”
While the first feedback on a prototype is being collected, more sophisticated approaches can be built and tested quickly, as we just need to replace the underlying code while the front-end stays the same. At this point, we can also decide if further development will be worthwhile or not. If we have a green light, one of the core parts of our approach to data science projects comes into play.
As the team is staffed in-depth skill-wise, which includes front-end development, we can deliver complete solutions from inception to final delivery. This ensures that we can cooperate with our users along the way and never just throw results over an imaginary wall for someone else to deal with the nitty-gritty details of implementation.
The direct communication exchange during development and first rollouts also opens a way for fast feedback between the end-users and the developers. It regularly happens that requests voiced by users in a Slack channel are immediately picked up by a developer and solved within the same day in close collaboration between the team and stakeholders. This close interaction between individuals is a strong way of making sure the development stays on track and as close to continuous delivery as possible.
Wait…I know what you are talking about
The main approach used at sennder is dual-track agile. It helps us to stay close to user needs via ongoing user research while building and shipping software at the same time. The first discovery track, which collects information about the underlying problem and user feedback, runs alongside the second development track, which builds more complex solutions.
Dual Track Agile at sennder:
As our individual data science teams are rather small and the first set of prototypes are purposely very basic, the distinction of the tracks is often not as clear as the picture makes it look. Also, due to the type of solutions we are building, our iteration speed might start at a slower pace as we need to clarify available data first, then pick up during the first testing and iteration phase, and then slow down again as the solutions become more sophisticated or intricate and we are integrating directly into already existing systems.
Example of implementation
One example of this process at work is the development of an interface to collect empty truck locations. It also shows how classical development and data science topics depend on each other.
The idea for this feature came up while interviewing a group of partner managers, who work directly with our network of carriers, about their daily work to get an idea of how they match carriers to loads. We learned that a number of them kept track of available trucks in the form of handwritten notes. It became clear to the team that this was a set of valuable data points being lost for further analysis and use. Furthermore, deeper integration of the workflow into our core products would lead to efficiency gains due to less tool switching.
We followed up with additional interview sessions to clearly understand how the partner managers work with their written notes in combination with our already existing tool suite. Similar to the initial focus groups, these user interviews are always attended by the whole development team to make sure we are getting different perspectives on the topic. Furthermore, we record these meetings for internal use so nothing is lost. This allows us to get back to certain key conversations at a later stage if things are unclear or we want to dig deeper. Immediately after the sessions we also create a dedicated Slack channel between the development team and our users. This allows us to get quick feedback on ideas or to clarify open questions.
With having a better understanding of the problem and the mostly-manual solution at hand, we went into two directions:
- Building an initial prototype to test a recommendation of loads based on user input
- Investigating what would be needed to integrate the solution into our existing system and how we can leverage it further
The initial prototype was based on a predesigned Google sheet that allowed a Partner Manager to enter the information they had about truck locations. The needed inputs were defined together with the partner managers, to make sure that all deciding factors were captured.
At the same time, we were building the first basic back-end service needed to match loads to entered data points. Within a few days, we were able to get back to our users with a first version that sent out automatic emails whenever we found a load that matched with the truck they had entered.
Over the following weeks, we iterated over the matching of loads, added the needed infrastructure to keep track of trucks the users entered, and once we verified that the solution was of use, we started working with our product designer to sketch out a version that could be integrated into our existing tools. This way, we could start pushing for direct integration into our existing tools.
Fast-forwarding a few weeks, the feature is now part of our main systems and collecting the resulting data points.
But the work isn’t done here, it actually just starts.
On one hand, we are still iterating over improvements in the user interface with our front-end developer and users. From the data science perspective, we are now heading back into a research and testing phase to use the generated data for load recommendations in several different systems and forms. This means restarting the interview and research processes with core user groups and stakeholders and building on top of already existing basic recommender models in-house. At the same time, we are working with other business units, like our customer-facing marketplace, to leverage data points they generate for a centralized recommendation system that can feed the multiple use cases throughout the company and systems to optimize our operations.
Our goal for our data science team is to stay as close to the customer and user as possible to build valuable software in short iterations, and a hint of agile allows us to do exactly that.