Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements, up to 20x faster #183

Merged
merged 7 commits into from
Feb 24, 2023
Merged

Performance improvements, up to 20x faster #183

merged 7 commits into from
Feb 24, 2023

Conversation

Bodigrim
Copy link
Contributor

In one of my projects I've been trying to migrate from home-grown low-level helpers to filepath API (e. g., takeExtension fn == ".foo" instead of ".foo" `B.isSuffixOf` fn) but failed miserably: performance penalties were too harsh, especially on Windows.

I fixed some low-hanging fruits, and I think results are quite good: many functions are twice faster now, and System.OsPath on Windows is often 10x-20x faster. Here is a detailed report:

filepath (string)
  splitExtension (posix):
    2.01 μs ±  62 ns, 40% less than baseline
  splitExtension (windows):
    2.02 μs ±  51 ns, 76% less than baseline
  takeExtension (posix):
    643  ns ±  13 ns, 10% less than baseline
  takeExtension (windows):
    836  ns ±  26 ns, 84% less than baseline
  replaceExtension (posix):
    778  ns ±  31 ns, 27% less than baseline
  replaceExtension (windows):
    795  ns ±  19 ns, 25% less than baseline
  dropExtension (posix):
    2.01 μs ±  55 ns, 39% less than baseline
  dropExtension (windows):
    2.01 μs ±  61 ns, 75% less than baseline
  addExtension (posix):
    887  ns ±  26 ns,  5% less than baseline
  addExtension (windows):
    935  ns ±  30 ns,       same as baseline
  hasExtension (posix):
    638  ns ±  13 ns,       same as baseline
  hasExtension (windows):
    1.46 μs ±  27 ns, 76% less than baseline
  splitExtensions (posix):
    2.05 μs ±  58 ns, 38% less than baseline
  splitExtensions (windows):
    5.25 μs ± 113 ns, 46% less than baseline
  dropExtensions (posix):
    2.03 μs ±  57 ns, 38% less than baseline
  dropExtensions (windows):
    3.92 μs ± 105 ns, 53% less than baseline
  takeExtensions (posix):
    683  ns ±  27 ns,       same as baseline
  takeExtensions (windows):
    3.85 μs ± 129 ns, 37% less than baseline
  replaceExtensions (posix):
    661  ns ±  16 ns, 35% less than baseline
  replaceExtensions (windows):
    763  ns ±  20 ns, 26% less than baseline
  isExtensionOf (posix):
    705  ns ±  15 ns,       same as baseline
  isExtensionOf (windows):
    3.89 μs ± 111 ns, 37% less than baseline
  stripExtension (posix):
    593  ns ±  14 ns,       same as baseline
  stripExtension (windows):
    582  ns ±  13 ns,  3% less than baseline
  splitFileName (posix):
    1.26 μs ±  27 ns, 52% less than baseline
  splitFileName (windows):
    1.58 μs ±  53 ns, 74% less than baseline
  takeFileName (posix):
    689  ns ±  18 ns,       same as baseline
  takeFileName (windows):
    1.43 μs ±  55 ns, 76% less than baseline
  replaceFileName (posix):
    513  ns ±  13 ns, 43% less than baseline
  replaceFileName (windows):
    757  ns ±  14 ns, 10% less than baseline
  dropFileName (posix):
    1.29 μs ±  26 ns, 30% less than baseline
  dropFileName (windows):
    885  ns ±  29 ns, 59% less than baseline
  takeBaseName (posix):
    695  ns ±  13 ns, 18% less than baseline
  takeBaseName (windows):
    2.97 μs ± 103 ns, 78% less than baseline
  replaceBaseName (posix):
    437  ns ±  15 ns, 59% less than baseline
  replaceBaseName (windows):
    794  ns ±  27 ns,  9% less than baseline
  takeDirectory (posix):
    2.71 μs ±  52 ns, 22% less than baseline
  takeDirectory (windows):
    1.10 μs ±  39 ns, 49% less than baseline
  replaceDirectory (posix):
    1.91 μs ±  65 ns,  7% less than baseline
  replaceDirectory (windows):
    1.93 μs ±  61 ns,  9% less than baseline
  combine (posix):
    654  ns ±  16 ns, 28% less than baseline
  combine (windows):
    924  ns ±  26 ns,       same as baseline
  splitPath (posix):
    6.99 μs ± 215 ns,       same as baseline
  splitPath (windows):
    6.27 μs ± 205 ns,       same as baseline
  joinPath (posix):
    2.16 μs ±  42 ns,       same as baseline
  joinPath (windows):
    933  ns ±  28 ns,       same as baseline
  splitDirectories (posix):
    9.32 μs ± 235 ns,  5% less than baseline
  splitDirectories (windows):
    7.56 μs ± 206 ns,       same as baseline
  splitDrive (posix):
    785  ns ±  26 ns, 14% less than baseline
  splitDrive (windows):
    576  ns ±  18 ns, 10% less than baseline
  joinDrive (posix):
    722  ns ±  19 ns, 21% less than baseline
  joinDrive (windows):
    926  ns ±  26 ns,       same as baseline
  takeDrive (posix):
    22.9 ns ± 820 ps,       same as baseline
  takeDrive (windows):
    10.7 ns ± 236 ps,  3% less than baseline
  hasDrive (posix):
    9.78 ns ± 202 ps,       same as baseline
  hasDrive (windows):
    8.35 ns ± 298 ps,       same as baseline
  dropDrive (posix):
    418  ns ±  16 ns, 49% less than baseline
  dropDrive (windows):
    520  ns ± 9.0 ns, 36% less than baseline
  isDrive (posix):
    20.3 ns ± 442 ps,       same as baseline
  isDrive (windows):
    8.60 ns ± 248 ps,       same as baseline
  hasTrailingPathSeparator (posix):
    309  ns ± 7.3 ns,       same as baseline
  hasTrailingPathSeparator (windows):
    316  ns ± 6.7 ns,       same as baseline
  addTrailingPathSeparator (posix):
    1.89 μs ±  70 ns,       same as baseline
  addTrailingPathSeparator (windows):
    1.90 μs ±  67 ns,       same as baseline
  dropTrailingPathSeparator (posix):
    1.90 μs ±  69 ns,       same as baseline
  dropTrailingPathSeparator (windows):
    1.91 μs ±  67 ns,       same as baseline
  normalise (posix):
    13.7 μs ± 499 ns,       same as baseline
  normalise (windows):
    8.77 μs ± 229 ns,       same as baseline
  equalFilePath (posix):
    13.6 μs ± 412 ns,       same as baseline
  equalFilePath (windows):
    8.76 μs ± 214 ns,       same as baseline
  makeRelative (posix):
    14.5 μs ± 417 ns,       same as baseline
  makeRelative (windows):
    24.5 μs ± 864 ns,       same as baseline
  isRelative (posix):
    21.1 ns ± 454 ps,       same as baseline
  isRelative (windows):
    8.42 ns ± 242 ps,       same as baseline
  isAbsolute (posix):
    21.2 ns ± 800 ps,       same as baseline
  isAbsolute (windows):
    8.65 ns ± 260 ps,       same as baseline
  isValid (posix):
    876  ns ±  30 ns,       same as baseline
  isValid (windows):
    877  ns ±  27 ns,       same as baseline
  makeValid (posix):
    2.47 μs ±  87 ns,       same as baseline
  makeValid (windows):
    2.50 μs ±  58 ns,       same as baseline
  splitSearchPath (posix):
    5.39 μs ± 210 ns,       same as baseline
  splitSearchPath (windows):
    4.80 μs ± 112 ns,       same as baseline
filepath (AFPP)
  splitExtension (posix):
    108  ns ± 2.9 ns, 49% less than baseline
  splitExtension (windows):
    364  ns ±  14 ns, 89% less than baseline
  takeExtension (posix):
    62.0 ns ± 1.8 ns, 37% less than baseline
  takeExtension (windows):
    292  ns ± 7.7 ns, 90% less than baseline
  replaceExtension (posix):
    65.9 ns ± 2.5 ns, 28% less than baseline
  replaceExtension (windows):
    198  ns ± 7.1 ns, 28% less than baseline
  dropExtension (posix):
    81.7 ns ± 1.6 ns, 55% less than baseline
  dropExtension (windows):
    341  ns ±  13 ns, 89% less than baseline
  addExtension (posix):
    93.1 ns ± 2.6 ns,  3% less than baseline
  addExtension (windows):
    167  ns ± 3.4 ns,       same as baseline
  hasExtension (posix):
    59.2 ns ± 590 ps, 54% less than baseline
  hasExtension (windows):
    289  ns ± 6.7 ns, 91% less than baseline
  splitExtensions (posix):
    122  ns ± 3.3 ns, 40% less than baseline
  splitExtensions (windows):
    403  ns ±  13 ns, 87% less than baseline
  dropExtensions (posix):
    97.2 ns ± 3.7 ns, 44% less than baseline
  dropExtensions (windows):
    381  ns ±  14 ns, 88% less than baseline
  takeExtensions (posix):
    47.4 ns ± 1.7 ns, 43% less than baseline
  takeExtensions (windows):
    321  ns ± 7.0 ns, 89% less than baseline
  replaceExtensions (posix):
    68.4 ns ± 2.5 ns,       same as baseline
  replaceExtensions (windows):
    228  ns ± 7.0 ns, 11% less than baseline
  isExtensionOf (posix):
    53.2 ns ± 1.9 ns, 39% less than baseline
  isExtensionOf (windows):
    338  ns ± 7.9 ns, 89% less than baseline
  stripExtension (posix):
    14.4 ns ± 366 ps,       same as baseline
  stripExtension (windows):
    24.0 ns ± 656 ps,       same as baseline
  splitFileName (posix):
    63.6 ns ± 2.1 ns, 54% less than baseline
  splitFileName (windows):
    302  ns ± 6.7 ns, 90% less than baseline
  takeFileName (posix):
    66.6 ns ± 2.4 ns, 48% less than baseline
  takeFileName (windows):
    288  ns ± 6.5 ns, 91% less than baseline
  replaceFileName (posix):
    7.02 ns ± 246 ps,       same as baseline
  replaceFileName (windows):
    155  ns ± 3.6 ns, 94% less than baseline
  dropFileName (posix):
    51.5 ns ± 784 ps, 55% less than baseline
  dropFileName (windows):
    288  ns ± 7.3 ns, 91% less than baseline
  takeBaseName (posix):
    84.8 ns ± 1.7 ns, 57% less than baseline
  takeBaseName (windows):
    377  ns ± 7.1 ns, 89% less than baseline
  replaceBaseName (posix):
    27.6 ns ± 1.0 ns, 64% less than baseline
  replaceBaseName (windows):
    107  ns ± 4.0 ns, 48% less than baseline
  takeDirectory (posix):
    117  ns ± 3.5 ns, 39% less than baseline
  takeDirectory (windows):
    552  ns ±  13 ns, 91% less than baseline
  replaceDirectory (posix):
    1.11 μs ±  17 ns,       same as baseline
  replaceDirectory (windows):
    1.33 μs ±  29 ns,       same as baseline
  combine (posix):
    7.09 ns ± 204 ps,       same as baseline
  combine (windows):
    155  ns ± 3.6 ns, 94% less than baseline
  splitPath (posix):
    2.20 μs ±  52 ns,       same as baseline
  splitPath (windows):
    5.99 μs ± 219 ns, 31% less than baseline
  joinPath (posix):
    834  ns ±  27 ns,       same as baseline
  joinPath (windows):
    30.1 μs ± 1.0 μs, 27% less than baseline
  splitDirectories (posix):
    3.32 μs ± 105 ns,       same as baseline
  splitDirectories (windows):
    10.0 μs ± 209 ns, 25% less than baseline
  splitDrive (posix):
    67.1 ns ± 1.8 ns,       same as baseline
  splitDrive (windows):
    226  ns ± 7.0 ns, 92% less than baseline
  joinDrive (posix):
    37.0 ns ± 902 ps,       same as baseline
  joinDrive (windows):
    73.4 ns ± 1.8 ns,       same as baseline
  takeDrive (posix):
    21.5 ns ± 454 ps,       same as baseline
  takeDrive (windows):
    143  ns ± 3.9 ns, 95% less than baseline
  hasDrive (posix):
    21.8 ns ± 864 ps,       same as baseline
  hasDrive (windows):
    143  ns ± 3.6 ns, 95% less than baseline
  dropDrive (posix):
    43.0 ns ± 994 ps,       same as baseline
  dropDrive (windows):
    184  ns ± 6.7 ns, 93% less than baseline
  isDrive (posix):
    41.1 ns ± 1.1 ns,       same as baseline
  isDrive (windows):
    186  ns ± 6.9 ns, 93% less than baseline
  hasTrailingPathSeparator (posix):
    5.45 ns ± 212 ps,       same as baseline
  hasTrailingPathSeparator (windows):
    6.34 ns ± 204 ps,       same as baseline
  addTrailingPathSeparator (posix):
    36.7 ns ± 1.0 ns,       same as baseline
  addTrailingPathSeparator (windows):
    101  ns ± 668 ps,       same as baseline
  dropTrailingPathSeparator (posix):
    49.1 ns ± 1.7 ns,       same as baseline
  dropTrailingPathSeparator (windows):
    98.6 ns ± 3.2 ns,       same as baseline
  normalise (posix):
    5.50 μs ± 211 ns,       same as baseline
  normalise (windows):
    39.2 μs ± 538 ns, 27% less than baseline
  equalFilePath (posix):
    5.96 μs ± 208 ns,       same as baseline
  equalFilePath (windows):
    51.1 μs ± 1.7 μs, 21% less than baseline
  makeRelative (posix):
    6.03 μs ± 223 ns,       same as baseline
  makeRelative (windows):
    51.6 μs ± 1.6 μs, 24% less than baseline
  isRelative (posix):
    22.2 ns ± 834 ps, 16% less than baseline
  isRelative (windows):
    239  ns ± 6.5 ns, 92% less than baseline
  isAbsolute (posix):
    22.3 ns ± 800 ps, 15% less than baseline
  isAbsolute (windows):
    241  ns ± 6.5 ns, 92% less than baseline
  isValid (posix):
    15.1 ns ± 496 ps,       same as baseline
  isValid (windows):
    33.0 μs ± 819 ns,       same as baseline
  makeValid (posix):
    2.48 μs ±  69 ns,       same as baseline
  makeValid (windows):
    61.9 μs ± 1.9 μs, 19% less than baseline
  splitSearchPath (posix):
    1.59 μs ±  34 ns,       same as baseline
  splitSearchPath (windows):
    4.59 μs ± 103 ns,       same as baseline

@hasufell
Copy link
Member

Excellent.

@hasufell
Copy link
Member

What I wonder is whether some of this can be backported to the string-only filepath or whether the new API introduced performance regressions (a few functions were slightly rewritten).

@hasufell
Copy link
Member

I get compilation error on 9.2.6:

/x86_64-linux/ghc-9.2.6/filepath-1.4.100.0/t/abstract-filepath/build/abstract-filepath/abstract-filepath-tmp/OsPathSpec.dyn_o )

tests/abstract-filepath/OsPathSpec.hs:56:64: error: [-Wincomplete-uni-patterns, -Werror=incomplete-uni-patterns]
    Pattern match(es) are non-exhaustive
    In a lambda abstraction:
        Patterns of type ‘Either EncodingException String’ not matched:
            Left _
   |
56 |     property $ \(padEven -> bs) -> (Posix.encodeWith ucs2le . (\(Right r) -> r) . Posix.decodeWith ucs2le . OS.PS . toShort) bs === Right (OS.PS . toShort $ bs))
   |                                                                ^^^^^^^^^^^^^^^

tests/abstract-filepath/OsPathSpec.hs:58:66: error: [-Wincomplete-uni-patterns, -Werror=incomplete-uni-patterns]
    Pattern match(es) are non-exhaustive
    In a lambda abstraction:
        Patterns of type ‘Either EncodingException String’ not matched:
            Left _
   |
58 |     property $ \(padEven -> bs) -> (Windows.encodeWith ucs2le . (\(Right r) -> r) . Windows.decodeWith ucs2le . OS.WS . toShort) bs
   |                                                                  ^^^^^^^^^^^^^^^
Error: cabal: Failed to build test:abstract-filepath from filepath-1.4.100.0.

@Bodigrim
Copy link
Contributor Author

What I wonder is whether some of this can be backported to the string-only filepath or whether the new API introduced performance regressions (a few functions were slightly rewritten).

The changes improve performance both for the legacy API and for the new one.

I get compilation error on 9.2.6:

It's a warning for me, and present in the master branch as well.

-- then break filename into basename and extension, then recombine dir and basename.
-- This is way too expensive, see @splitFileName_@ comment for discussion.
--
-- Instead we speculatively split on the extension separator first, then check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick

@hasufell
Copy link
Member

If you rebase against master and then push to this repo instead of your fork, we can benefit from the fixed CI.

-- If bs' is empty, then s2 as the last character of dirSlash must be a path separator,
-- so we are in the middle of shared drive.
-- Otherwise, since s1 is a path separator, we might be in the middle of UNC path.
, null bs' || maybe False (null . snd) (readDriveUNC dirSlash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest... this code will be harder to maintain in the face of fixing bugs like this:

ghci> W.splitFileName "\\\\.\\C:"
("\\\\.\\","C:")

These will be eventually fixed when I get around finishing #99

So I can't guarantee there won't be performance regressions in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll be happy to take another look at performance after #99.

FWIW this branch deals with Windows file names starting with \\, so it's a relatively uncommon code path, should not be a bottleneck anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with merging. If I ever manage to finish #99, then it will be a major bump and almost none of the existing code will remain.

The idea is to parse into an ADT and then deal with that only. My guess is that will increase allocations, so we'll see performance regression.

@Bodigrim
Copy link
Contributor Author

@hasufell any more comments / suggestions?

@hasufell hasufell merged commit 9d4edc2 into haskell:master Feb 24, 2023
@hasufell
Copy link
Member

@hasufell
Copy link
Member

@Bodigrim we have a regression:

With this patch (1.4.100.2):

> W.splitFileName  "\\\\?\\A:\\fred"
("\\\\?\\A:\\fred", "")

With 1.4.100.0:

> W.splitFileName  "\\\\?\\A:\\fred"
("\\\\?\\A:\\", "fred")

This was by chance found by a property test (they are rather unreliable): https://github.com/haskell/filepath/actions/runs/4263893204/jobs/7421218609#step:5:1112

However, W.splitDrive match.

tbagrel1 pushed a commit to tweag/ghc that referenced this pull request Aug 2, 2023
Resolves #23741

Metric Decrease:
    MultiComponentModules
    MultiComponentModulesRecomp
    MultiLayerModules
    MultiLayerModulesRecomp
    T10421
    T12234
    T12425
    T13035
    T13701
    T13719
    T16875
    T18304
    T18698a
    T18698b
    T21839c
    T9198
    TcPlugin_RewritePerf
    hard_hole_fits

Metric decrease on Windows can be probably attributed to haskell/filepath#183
leafcathead pushed a commit to leafcathead/ghc that referenced this pull request Dec 6, 2023
Resolves #23741

Metric Decrease:
    MultiComponentModules
    MultiComponentModulesRecomp
    MultiLayerModules
    MultiLayerModulesRecomp
    T10421
    T12234
    T12425
    T13035
    T13701
    T13719
    T16875
    T18304
    T18698a
    T18698b
    T21839c
    T9198
    TcPlugin_RewritePerf
    hard_hole_fits

Metric decrease on Windows can be probably attributed to haskell/filepath#183

(cherry picked from commit e9a0fa3)
@sternenseemann
Copy link

@hasufell That regression was fixed in 6a1c98d, correct?

I noticed at least one other change in behavior which I discovered by chance in the filepath-bytestring equivalency test suite. It appears e.g. when comparing filepath-1.4.100.4 (GHC 9.8.1) to filepath-1.4.2.2 (GHC 9.4.8):

$ ghci-9.4.8
GHCi, version 9.4.8: https://www.haskell.org/ghc/  :? for help
ghci> import qualified System.FilePath.Windows as W
ghci> W.takeDirectory "//?/a:"
"//?/a:"
$ ghci-9.8.1
GHCi, version 9.8.1: https://www.haskell.org/ghc/  :? for help
ghci> W.takeDirectory "//?/a:"
"//?/"

Can you comment whether this is also to be considered a regression?

@hasufell
Copy link
Member

hasufell commented Jan 8, 2024

//?/ rarely behaves correctly, but yes, it's a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants