Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP parse/query signature errors on subsequent S3 requests after one bad request #205

Open
sanketr opened this issue Jun 28, 2016 · 3 comments

Comments

@sanketr
Copy link

sanketr commented Jun 28, 2016

I just discovered this when testing AWS S3 with versioned buckets. Here is what I am doing:

  • Make a write request to versioned bucket with nested key, say bucket1/test1/key1 - I upload the wrong length content to it, say send 11 bytes while setting the content length for conduit requestBodySource to 1.
  • After the write request returns, immediately make a read request for the same object (without version id - we want to get the latest version. So, we set S3.goVersionId = Nothing). I get 400 HTTP error with s3ErrorMessage "An error occurred when parsing the HTTP request."

I increased threadDelay between both requests to 15 seconds but get a different 403 error once in a while: "The request signature we calculated does not match the signature you provided. Check your key and signing method."

I had to increase threadDelay to 25 seconds (20 seconds mark still caused error) before this issue would go away. So, something seems wrong here with the logic when first request goes wrong.

@sanketr
Copy link
Author

sanketr commented Jun 28, 2016

Here is my code that reproduces the issue.

First generate an "out" file using dd command - it is 11 bytes long:
dd if=/dev/zero of=out bs=11 count=1 iflag=fullblock

Now, run the code below after setting appropriate bucket (search-replace "your-bucket-here") and credentials - the code below uploads the 11-byte "out" file created above but sets the length to 1 instead:

{-# LANGUAGE OverloadedStrings #-}

import qualified Aws
import qualified Aws.Core as Aws
import qualified Aws.S3 as S3
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sourceFile)
import qualified Data.Conduit.List as CL (foldM,mapM_)
import           Network.HTTP.Conduit (responseBody,requestBodySource,newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
import Control.Monad.IO.Class
import System.IO
import Control.Monad.Trans.Resource (runResourceT)
import Control.Concurrent.Async (async,waitCatch)
import Control.Exception (displayException)
import Data.Text as T (pack)
import Data.List (lookup)

main :: IO ()
main = do
  {- Set up AWS credentials and S3 configuration using the IA endpoint. -}
  Just creds <- Aws.loadCredentialsFromFile "/home/ec2-user/.aws/credentials.keys" "dev" 
  let cfg = Aws.Configuration Aws.Timestamp creds (Aws.defaultLog Aws.Error)
  let s3cfg = S3.s3 Aws.HTTP S3.s3EndpointUsClassic False

  {- Set up a ResourceT region with an available HTTP manager. -}
  httpmgr <- newManager tlsManagerSettings
  let file ="out"
  -- streams large file content, without buffering more than 10k in memory
  --cfile <- streamFile file
  let inbytes = sourceFile file
  lenb <- System.IO.withFile file ReadMode hFileSize
  req <- async $ runResourceT $ do
    Aws.pureAws cfg s3cfg httpmgr $
      (S3.putObject "your-bucket-here" ("folder1/sub1/1") (requestBodySource (fromIntegral 1) inbytes))
        { 
          S3.poMetadata = [("content-type","text;charset=UTF-8"),("content-length",T.pack $ show lenb)]
        -- Automatically creates bucket on IA if it does not exist,
        -- and uses the above metadata as the bucket's metadata.
          ,S3.poAutoMakeBucket = True
        }
  reqRes <- waitCatch req
  case reqRes of
    Left e -> print $ displayException $ e
    Right r -> print $ S3.porVersionId r
  req2 <- async $ runResourceT $ do
    {- Create a request object with S3.getObject and run the request with pureAws. -}
    S3.GetObjectResponse { S3.gorResponse = rsp, S3.gorMetadata = mdata } <- 
      Aws.pureAws cfg s3cfg httpmgr $
        S3.getObject "your-bucket-here" "folder1/sub1/1"
    -- TODO: must be able to decode metadata for length - else return error code
    {- Stream the response to a lazy bytestring -}
    liftIO $ LBS.writeFile "testaws" LBS.empty -- create an empty file
    responseBody rsp $$+- CL.mapM_ (\a -> liftIO $ LBS.appendFile "testaws" (LBS.fromStrict a))
    return mdata
  reqRes2 <- waitCatch req2
  case reqRes2 of
    Left e -> print $ displayException $ e
    Right r -> print $ lookup "content-length" (S3.omUserMetadata r)

Code run below - we first get the version Id back for the file we put with wrong length, and then get "Signature error" in subsequent read request - the actual error message can also be "HTTP parse error" like I reported before. It seems to me that this is really bad because a single bad request on a server will cause system-wide outage for all read requests :( So, will really appreciate help with resolving this. Happy to work with you to resolve this!

$ stack exec ghci -- Test1.hs -XOverloadedStrings
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( Test1.hs, interpreted )
Ok, modules loaded: Main.
*Main> main
Just "Tlr_p0M5.Tr3X.eyxo7py0djBTPe0dCI"
"S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = \"Forbidden\"}, s3ErrorCode = \"SignatureDoesNotMatch\", s3ErrorMessage = \"The request signature we calculated does not match the signature you provided. Check your key and signing method.\", s3ErrorResource = Nothing, s3ErrorHostId = Just \"mGfJ5oeBjlDQEXzVxQEH0NjHra3LnGdhr43eKO+kA95MttuwNGyRz4Q4xq72JGIUrBvExoZCSk8=\", s3ErrorAccessKeyId = Just \"AKIAJFRUGYFNBRA4PTGQ\", s3ErrorStringToSign = Just \"\\NUL\\NUL\\NUL\\NUL\\NUL\\NUL\\NUL\\NUL\\NUL\\NULGET\\n\\n\\nTue, 28 Jun 2016 22:50:17 GMT\\n/your-bucket-here/folder1%2Fsub1%2F1\", s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}"
*Main> 

HTTP Parse error on different run:

*Main> main
Just "YSHriLTZGhyhMSilx_5RFfx08IE9nW9r"
"S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = \"Bad Request\"}, s3ErrorCode = \"BadRequest\", s3ErrorMessage = \"An error occurred when parsing the HTTP request.\", s3ErrorResource = Nothing, s3ErrorHostId = Just \"S5mikbDsdSEG3PLzFB5zhpq0ttGmot0KIQO8tGpsb1+FM23ddsCHnK661x0fDlM0zQq99URsMFbOxwfmc3LSsJRa+QzM7tRb\", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}"

@sanketr sanketr changed the title HTTP parse errors under certain circumstances? HTTP parse/query signature errors on subsequent S3 requests after one bad request Jun 28, 2016
@sanketr
Copy link
Author

sanketr commented Jun 29, 2016

Some more investigation on this error - shared HTTP Manager seems to be the root cause here (which seems kind of obvious from the nature of error - something is shared which becomes corrupt across requests). Using different HTTP manager for getObject request makes the error disappear.

I also simulated a source file for requestBodySource where length is correct, but the source aborts mid-way due to a simulated failure (to simulate network issues). In that case, there are no errors with pureAws call. So, it seems we have just one case where sending wrong length without any failures will cause some kind of shared state to become corrupt here, which gets released within 25 seconds. I hope this helps more with identifying the issue.

@sanketr
Copy link
Author

sanketr commented Jun 29, 2016

This seems to be http-client bug. So, I have opened a ticket there. I will leave it to you to decide whether to close this ticket, or wait for the upstream issue to be resolved before closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant