As a machine learning engineer, I have to be current on the various languages that are in use to build ML models. I came across Julia through peer discussions and readings. In this post, I will share a brief comparison of Julia and python to implement machine learning models.
Julia and Python
Julia is a high-level, high-performance, dynamic programming language. While it is a general-purpose language and can be used to write any application, Julia shines in numerical analysis and computational science.
Python is a powerful general-purpose programming language. Developers use Python for web development, data science, creating software prototypes, and other similar purposes. With its easy to learn syntax, Python is the chosen programming language for beginners.
Several popular products of the tech age are written in Python such as Dropbox, Spotify, Instagram, Reddit, Uber, etc.
Comparison use case: APS Failure at Scania Trucks Data Set
The air pressure system (APS) plays a critical role in heavy Scania trucks. APS generates pressurized air that is used in various critical functions such as braking and gear changing.
Accurate prediction of the failure status of APS based on the measurements of truck mechanical system attributes can significantly reduce the operational cost of the truck fleet.
Data Set Description
The dataset consists of sensor measurement data from Scania trucks in everyday use. This system is the Air Pressure system (APS) that generates pressurized air that assists in functions such as braking and gear changes.
The positive category of the data set includes component failures of specific components of the APS system. The negative category includes trucks whose failures are not related to APS. The data is selected by experts. The dataset contains a subset of all available data.
The training set contains 60,000 examples in a total of which 59,000 are negative cases and 1,000 positive cases. The test set contains 16000 examples. There are 171 attributes per record.
For proprietary reasons, the attribute names in the data have been anonymized. It consists of a single digital counter and a histogram. The histogram consists of boxes with different conditions. Usually, both ends of the histogram have open conditions. For example, if we measure the ambient temperature “T” then the histogram could be defined with four bins where the attributes are classes and the operational data is anonymized and unknown to the user. The operational data have an identifier and a bin id, such as “Identifier_Bin“. Of the 171 attributes, 7 are histogram variables. Missing values are denoted by “na“.
The goal of the model is to accurately predict APS failure.
Python vs Julia: Implementation details and results
This is a binary classification problem. In machine learning algorithm classification is the task of how to assign class labels to items or examples from the problem.
Here we are going to classify the success and failure of the Air Pressure System.
Some algorithms specifically do binary classification and do not natively support more than two classes. Logistic Regression and Support Vector Machines are examples of such algorithms. The dataset contains up to 82% missing values per attribute. Furthermore, many of the attributes contain outliers. Therefore, the mean values replace these missing values.
While fitting a logistic regression model on the available dataset we use the grid searching of hyperparameters technique. Grid search is a method of hyperparameter adjustment. It will systematically build and evaluate models for each combination of algorithm parameters specified in the grid.
The following image shows how we fit this model on the APS dataset
Here we fit the logistic regression model on the available data with 99% accuracy. Shown below are results and comparisons in two different languages. The time to fit this model is 146.85 secs utilizing 1.5 MByte memory. Now Let’s look at the implementation of the same model in the Julia language.
Julia uses the same dataset and methods for fitting the model. Here we use Mean imputation for preprocessing the data and replacing missing values. It simply calculates the mean of the observed values for that variable for all individuals who are non-missing. In the given dataset the data is not well balanced and there are very low positive records.
We used the same method in both Julia and python language. The following image shows the implementation details of the logistic model which was implemented in Julia.
However, the model built with Julia shows the same results as the Python implementation. But it gives a big difference in terms of time and memory. you can see that in the following image. Julia takes less time than python for the above example – 126.85 seconds as compared to Python’s 146.85. This is one of the significant differences between Julia vs Python. Julia is faster because Julia is not interpreted, it is also compiled at Just-In-Time or run time using the LLVM framework.
- Speed. The above example shows that Julia is faster than Python with speeds coming close to that of C language.
- Community. Python is older and more popular than Julia and has greater community support.
- Code Conversion. Julia is easy to code and converts from C codebases as compared to Python
- Air pressure system failures in Scania trucks. (n.d.). Kaggle.Com. Retrieved August 25, 2020, from https://www.kaggle.com/uciml/aps-failure-at-scania-trucks-data-set
- Bezanson, J. (2019). The Julia Language. Julialang.Org. https://julialang.org
- Gondek, C., Hafner, D., & Sampson, O. (2016, October). Prediction of Failures in the Air Pressure System of Scania Trucks Using a Random Forest and Feature Engineering. https://www.researchgate.net/publication/309195602_Prediction_of_Failures_in_the_Air_Pressure_System_of_Scania_Trucks_Using_a_Random_Forest_and_Feature_Engineering
- Industrial Challenge. (n.d.). Ida2016.Blogs.Dsv.Su.Se. Retrieved August 25, 2020, from https://ida2016.blogs.dsv.su.se/?page_id=1387
- Lindgren, T., & Biteus, J. (2016, September). UCI Machine Learning Repository: APS Failure at Scania Trucks Data Set. Archive.Ics.Uci.Edu. https://archive.ics.uci.edu/ml/datasets/APS+Failure+at+Scania+Trucks
- Paul, S. (2018, August 15). Hyperparameter Optimization in Machine Learning Models. DataCamp. https://www.datacamp.com/community/tutorials/parameter-optimization-machine-learning-models
- Python.org. (2019, May 29). Python.Org; Python.org. https://www.python.org
Quick start guide – ScikitLearn.jl. (n.d.). Scikitlearnjl.Readthedocs.Io. Retrieved August 25, 2020, from https://scikitlearnjl.readthedocs.io/en/latest/quickstart
sklearn.linear_model.LogisticRegression — scikit-learn 0.21.2 documentation. (2014). Scikit-Learn.Org. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
If you liked this post, here are a few more that you may enjoy reading,
“Synerzip team is very responsive & quick to adopt new technologies. Team naturally follows best practices, does peer reviews and delivers quality output, thus exceeding client expectations.”
“Synerzip’s agile processes & daily scrums were very valuable, made communication & time zone issues work out successfully.”
“Synerzip’s flexible and responsible team grew to be an extension to the StepOne team. Typical concerns of time zone issues did not exist with Synerzip team.”
“Synerzip worked in perfect textbook Agile fashion – releasing working demos every two weeks. Though aggressive schedules, Synerzip was able to deliver a working product in 90 days, which helped Zimbra stand by their commitment to their customers.”
“Outstanding product delivery and exceptional project management, comes from DNA of Synerzip.”
“Studer product has practically taken a 180% turn from what it was, before Synerzip came in. Synerzip cost is very reasonable as compared to the work they do.”
“Synerzip makes the timezone differences work FOR the customer, enabling a positive experience for us. ‘Seeing is believing’, so we decided to give it a shot and the project was very successful.”
“The Synerzip team seamlessly integrates with our team. We started seeing results within the first sprint. And due to the team’s responsiveness, we were able to get our product to the sales cycle within 7 months.”
“Product management team from Synerzip is exceptional and has a clear understanding of Studer’s needs. Synerzip team gives consistent performance and never misses a deadline.”
“Synerzip is different because of the quality of their leadership, efficient team and clearly set methodologies. Studer gets high level of confidence from Synerzip along with significant cost advantage of almost 50%”
“Synerzip’s hiring approach and practices are worth applauding. Working with Synerzip is like
“What you see is what you get”.”
“Synerzip has dedicated experts for every area. Synerzip helped Tangoe save a lot of cost, still giving a very high quality product.”
“Synerzip gives tremendous cost advantage in terms of hiring and growing the team to be productive verses a readymade team. Synerzip is one company that delivers “co –development” to the core!”
“Synerzip is a great company to work with. Good leadership and a warm, welcoming attitude of the team are additional plus points.”
“Our relationship with Synerzip is very collaborative, and they are our true partners as our values match with theirs.”
“Synerzip has proven to be a great software product co-development partner. It is a leader because of its great culture, its history, and its employee retention policies. ExamSoft’s clients are happy with the product, and that’s how ExamSoft measures that all is going well.”
“They possess a great technical acumen with a burning desire to solve problems. The team always takes the initiative and ownership in all the processes they follow. Synerzip has played a vital role in our scaling up and was a perfect partner in cost, efficiency, and schedules.”
“As we are a startup, things change on a weekly basis, but Synerzip team has been flexible in adapting the same”
“Synerzip team has been very proactive in building the best quality software, bringing in best practices, and cutting edge innovation for our company.”
“We’ve been working for more than six years with Synerzip and its one of the better, if not the best, experience I’ve had working with an outsourcing company.”
“My experience with Synerzip is that they have the talent. You throw a problem at them, and someone from that team helps to solve the issue.”
“The breadth and depth of technical abilities that Synerzip brings on the table and the UX work done by them for this project exceeded my expectations!”
“Synerzip UX designers very closely represent their counterparts in the US in terms of their practice, how they tackle problems, and how they evangelize the value of UX.”
“Synerzip team understood the requirements well and documented them to make sure they understood them rightly.”
“Synerzip is definitely not a typical offshore company. Synerzip team is incredibly communicative, agile, and delivers on its commitments.”
“Working with Synerzip helped us accelerate our roadmap in ways we never thought possible!”
“While working with Synerzip, I get a feeling of working with a huge community of resources, who can jump in with the skills as needed.”