You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🙌 Help needed! See [8.3](#83-additional-language-support) on how to support more languages.
10
+
11
+
**Table of Contents:**
12
+
5
13
-[1. General Information](#1-general-information)
6
14
-[2. Maintainer](#2-maintainer)
7
15
-[3. File Format Support](#3-file-format-support)
@@ -31,15 +39,15 @@
31
39
## 1. General Information
32
40
33
41
Duplicated source code blocks can harm maintainability of software systems.
34
-
Duplo is a tool to find duplicated code blocks in large code bases. Duplo has special support for some
35
-
programming languages, meaning it can filter out (multi-line) comments and compiler directives.
36
-
For example: C, C++, Java, C#, and VB.NET. Any other text format is also supported.
42
+
Duplo is a tool to find duplicated code blocks in large code bases. Duplo has
43
+
special support for some programming languages, meaning it can filter out
44
+
(multi-line) comments and compiler directives. For example: C, C++, Java, C#,
45
+
and VB.NET. Any other text format is also supported.
37
46
38
47
## 2. Maintainer
39
48
40
-
Duplo was originally developed by Christian
41
-
M. Ammann and is now maintained and developed by Daniel
42
-
Lidström.
49
+
Duplo was originally developed by Christian M. Ammann and is now maintained and
50
+
developed by Daniel Lidström.
43
51
44
52
## 3. File Format Support
45
53
@@ -53,14 +61,10 @@ file formats:
53
61
- GCC assembly
54
62
- Ada
55
63
56
-
This means that Duplo will remove
57
-
preprocessor directives, block comments, using
58
-
statements, etc, to only consider duplicates
59
-
in actual code.
60
-
In addition, Duplo can be used as a general
61
-
(without special support) duplicates detector
62
-
in arbitrary text files and will even detect
63
-
duplicates found in the same file.
64
+
This means that Duplo will remove preprocessor directives, block comments, using
65
+
statements, etc, to only consider duplicates in actual code. In addition, Duplo
66
+
can be used as a general (without special support) duplicates detector in
67
+
arbitrary text files and will even detect duplicates found in the same file.
64
68
65
69
Sample output snippet:
66
70
@@ -92,23 +96,29 @@ If you have Docker, the way to run Duplo is to use this command:
92
96
> docker run --rm -i -w /src -v $(pwd):/src dlidstrom/duplo
93
97
```
94
98
95
-
This pulls the latest image and runs duplo. Note that you'll have to pipe the filenames into this command. A complete commandline sample will be shown below.
99
+
This pulls the latest image and runs duplo. Note that you'll have to pipe the
100
+
filenames into this command. A complete commandline sample will be shown below.
96
101
97
102
### 4.2. Pre-built binaries
98
103
99
-
Duplo is also available as a pre-built binary for (alpine) linux and macos. Grab the executable from the [releases](https://github.com/dlidstrom/Duplo/releases) page.
104
+
Duplo is also available as a pre-built binary for (alpine) linux and macos. Grab
105
+
the executable from the [releases](https://github.com/dlidstrom/Duplo/releases)
106
+
page.
100
107
101
-
You can of course build from source as well, and you'll have to do so to get a binary for Windows.
108
+
You can of course build from source as well, and you'll have to do so to get a
109
+
binary for Windows.
102
110
103
111
## 5. Usage
104
112
105
-
Duplo works with a list of files. You can either specify a file that contains the list of files, or you can pass them using `stdin`.
113
+
Duplo works with a list of files. You can either specify a file that contains
114
+
the list of files, or you can pass them using `stdin`.
106
115
107
116
Run `duplo --help` on the command line to see the detailed options.
108
117
109
118
### 5.1. Passing files using `stdin`
110
119
111
-
In each of the following commands, `duplo` will write the duplicated blocks into `out.txt` in addition to the information written to stdout.
120
+
In each of the following commands, `duplo` will write the duplicated blocks into
121
+
`out.txt` in addition to the information written to stdout.
112
122
113
123
#### 5.1.1. Bash
114
124
@@ -117,7 +127,13 @@ In each of the following commands, `duplo` will write the duplicated blocks into
Let's break this down. `find . -type f \( -iname "*.cpp" -o -iname "*.h" \)` is a syntax to look recursively in the current directory (the `.` part) for files (the `-type f` part) matching `*.cpp` or `*.h` (case insensitive). The output from `find` is piped into `duplo` which then reads the filenames from `stdin` (the `-` tells `duplo` to get the filenames from `stdin`, a common unix convention in many commandline applications). The result of the analysis is then written to `out.txt`.
130
+
Let's break this down. `find . -type f \( -iname "*.cpp" -o -iname "*.h" \)` is
131
+
a syntax to look recursively in the current directory (the `.` part) for files
132
+
(the `-type f` part) matching `*.cpp` or `*.h` (case insensitive). The output
133
+
from `find` is piped into `duplo` which then reads the filenames from `stdin`
134
+
(the `-` tells `duplo` to get the filenames from `stdin`, a common unix
135
+
convention in many commandline applications). The result of the analysis is then
136
+
written to `out.txt`.
121
137
122
138
#### 5.1.2. Windows
123
139
@@ -126,7 +142,8 @@ Let's break this down. `find . -type f \( -iname "*.cpp" -o -iname "*.h" \)` is
This command also works in a similar fashion to the Bash command, but instead of piping into a local`duplo` executable, it will pipe into `duplo` running inside Docker. This is very convenient as you do not have to install `duplo` separately. You will have to install Docker though, if you haven't already. That is a good thing to do anyway, since it opens up a lot of possibilities apart from running `duplo`.
139
-
140
-
Again, similarly to the Bash command, this uses `find` to find files in the current directory, then passes the file list to Docker which will pass it further into an instance of the latest version of `duplo`. The working directory in the `duplo` container should be `/src` (that's where the `duplo` executable is located) and the current path of your host machine will be mapped to `/src` when the container is running. The `-i` allows `stdin` of your host machine to be passed into Docker to allow `duplo` to read the filenames. Any parameters to `duplo` can be placed at the end of the command as you can see `- out.txt` has been.
155
+
This command also works in a similar fashion to the Bash command, but instead of
156
+
piping into a local`duplo` executable, it will pipe into `duplo` running inside
157
+
Docker. This is very convenient as you do not have to install `duplo`
158
+
separately. You will have to install Docker though, if you haven't already. That
159
+
is a good thing to do anyway, since it opens up a lot of possibilities apart
160
+
from running `duplo`.
161
+
162
+
Again, similarly to the Bash command, this uses `find` to find files in the
163
+
current directory, then passes the file list to Docker which will pass it
164
+
further into an instance of the latest version of `duplo`. The working directory
165
+
in the `duplo` container should be `/src` (that's where the `duplo` executable
166
+
is located) and the current path of your host machine will be mapped to `/src`
167
+
when the container is running. The `-i` allows `stdin` of your host machine to
168
+
be passed into Docker to allow `duplo` to read the filenames. Any parameters to
169
+
`duplo` can be placed at the end of the command as you can see `- out.txt` has
170
+
been.
141
171
142
172
### 5.2. Passing files using file
143
173
@@ -161,18 +191,19 @@ Again, the duplicated blocks are written to `out.txt`.
161
191
162
192
### 5.3. Xml output
163
193
164
-
Duplo can also output xml and there is a stylesheet that will format the result forviewingin a browser. This can be used as a report tab in your continuous integration tool (TeamCity, etc).
194
+
Duplo can also output xml and there is a stylesheet that will format the result
195
+
forviewingin a browser. This can be used as a report tab in your continuous
196
+
integration tool (GitHub Actions, TeamCity, etc).
165
197
166
198
## 6. Feedback and Bug Reporting
167
199
168
-
Please open an issue to discuss feedback,
169
-
feature requests and bug reports.
200
+
Please open an issue to discuss feedback, feature requests and bug reports.
170
201
171
202
## 7. Algorithm Background
172
203
173
204
Duplo uses the same techniques as Duploc to detect duplicated code blocks. See
@@ -213,12 +244,26 @@ Use Visual Studio 2019 to open the included solution file (or try `CMake`).
213
244
214
245
### 8.3. Additional Language Support
215
246
216
-
Duplo can analyze all text files regardless of format, but it has special support for some programming languages (C++, C#, Java, for example). This allows Duplo to improve the duplication detection as it can ignore preprocessor directives and/or comments.
247
+
Duplo can analyze all text files regardless of format, but it has special
248
+
support for some programming languages (C++, C#, Java, for example). This allows
249
+
Duplo to improve the duplication detection as it can ignore preprocessor
250
+
directives and/or comments.
251
+
252
+
To implement support for a new language, there are a couple of options:
217
253
218
-
To implement support for a new language, there are a couple of options (in order of complexity):
254
+
1. Implement `FileTypeBase` which has support for handling comments and
255
+
preprocessor directives. You just need to decide what is a comment. With this
256
+
option you need to implement a couple of methods, one which is
257
+
`CreateLineFilter`. This is to remove multiline comments. Look at
258
+
`CstyleCommentsFilter` for an example.
259
+
2. Implement `IFileType` interface directly. This gives you the most freedom but
260
+
also is the hardest option.
219
261
220
-
1. Implement `FileTypeBase` which has support for handling comments and preprocessor directives. You just need to decide what is a comment. With this option you need to implement a couple of methods, one which is `CreateLineFilter`. This is to remove multiline comments. Look at `CstyleCommentsFilter` for an example.
221
-
2. Implement `IFileType` interface directly. This gives you the most freedom but also is the hardest option of course.
262
+
You can see an example of how Java support was added effortlessly. It involves
263
+
copying an existing file type implementation and adjusting the lines that should
264
+
be filtered and how comments should be removed. Finally, add a few lines in
265
+
`FileTypeFactory.cpp` to choose the correct implementation based on the file
266
+
extension. Refer to [this commit](https://github.com/dlidstrom/Duplo/commit/320f9474354d41c3b35c178bb4b7f6c667025976) for all the details.
222
267
223
268
### 8.4. Language Suggestions
224
269
@@ -238,6 +283,8 @@ Send me a pull request!
238
283
239
284
## 9. Changes
240
285
286
+
- 0.8
287
+
- Add support for Java which was lost or never there in the first place
241
288
- 0.7
242
289
- Add support for Ada (thanks [@Knaldgas](https://github.com/Knaldgas)!)
243
290
- 0.6
@@ -264,7 +311,12 @@ For a pretty ui you should check out [duploq](https://github.com/duploq/duploq)
264
311
265
312
From duploq's Readme file:
266
313
267
-
> duploq's approach is a pretty straighforward. First, duploq allows you to choose where to look for the duplicates (files or folders). Then it builds list of input files and passes it to the Duplo engine together with necessary parameters. After the files have been processed, duploq parses Duplo's output and visualises the results in easy and intuitive way. Also it provides additional statistics information which is not a part of Duplo output.
314
+
> duploq's approach is a pretty straighforward. First, duploq allows you to
315
+
> choose where to look for the duplicates (files or folders). Then it builds
316
+
> list of input files and passes it to the Duplo engine together with necessary
317
+
> parameters. After the files have been processed, duploq parses Duplo's output
318
+
> and visualises the results in easy and intuitive way. Also it provides
319
+
> additional statistics information which is not a part of Duplo output.
0 commit comments