The aim of this study was to read and search through a large CSV file containing around 152644 entries (rows) using 3 different programming languages. Here we implemented our queries using Python, C++, and Unix Bash shell script. The task was to read the CSV file and search through the file for specific cities, day, month and year.
For any big-data related tasks there are certain limitations and some data pre-processing to be done. To make our queries easier, we separated the “timestamp” string into further sub-strings using the MS. Excel flash fill command. The “timestamp” column consisted of a large string, consisting of year, month, day and time in a single string. We made separate columns and separated the sub-strings “day”, “year”, and “month” from the whole “timestamp” string using flash fill thus making searching through the CSV file a lot easier and more efficient. The CSV file used was renamed from data.csv to data.csv and is uploaded in the assignment folder.
Python is a general-purpose, high-level programming language. It is a programming language that lets our work more quickly and integrate our systems more effectively. Shell is just programming interface that is useful to access operating system services. And Shell scripting is nothing but writing multiple commands on the shell to complete a certain task. C++ is a general-purpose programming language. It has imperative, object-oriented and generic programming features, while also providing facilities for low-level memory manipulation.
Python is considered to be cleaner and more direct, with emphasis code readability. On the other hand, Shell script is quite ambiguous. In terms of readability, in the problem that has given in assignment, C++ is likely to be more readable than shell script but not more than python.
Python strikes a good balance between fast compilation, readability and writability. C++ is a statically typed, free-form, multi-paradigm and a compiled programming language. In terms of writability, Shell script is ahead of the other two hands down. It takes very little amount of code to read and conduct search in a file with bash script. Python also requires less code to write than C++.
Reliability depends on how the problem is approached actually. If written accurately to meet all the requirements python, java and shell can be reliable. Performance: We have found that pperformance wise, Python is likely to take less time than C++. But Shell can directly execute the program in command line interface. So, Shell script is very fast.Length: Unix bash shell script: 29 lines (including spaces) Python: 43 lines (including spaces) C++: 144 lines (including spaces) Bash shell script has the least lines of code in terms of execution followed by python and C++ has the most number of lines. Run Time: C++: 318.434 seconds Python: 26.78 seconds Unix: 18.02 seconds So by far Unix bash shell script wins the race in terms of execution time. Python takes second place followed by C++.
Good error-handling, exception management and correct memory management are the design issues of C++ . Shell script syntax as well as cryptic command line parameters for each Unix application and that is a design issue. Python executes with the help of an interpreter instead of the compiler, which causes it to slow down because compilation and execution help it to work normally.
Easy to code,easy to read,expressive,portable,interpretted,dynamically typed,free and open source etc features have made Python special. A key feature of shell scripts is that the invocation of their interpreters is handled as a core operating system feature. On the contrary, C++ and python are object oriented programming languages. They support oop concepts like Inheritance, Polymorphism, Encapsulation, and Abstraction. So these are the experiences of us with three programming languages- C++, Python and Shell Script.