Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit testing can use unsafeRead because the length was verified #24

Merged
merged 3 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 19 additions & 13 deletions src/Data/Graph/Haggle/Algorithms/DFS.hs
Original file line number Diff line number Diff line change
Expand Up @@ -67,22 +67,25 @@ xdfsWith :: (Graph g)
-> [c]
xdfsWith g nextVerts f roots
| isEmpty g || null roots = []
| otherwise = runST $ do
bs <- newBitSet (maxVertexId g + 1)
res <- foldM (go bs) [] roots
return $ reverse res
| otherwise =
if any (not . (`elem` vertices g)) roots
then []
else runST $ do
bs <- newBitSet (maxVertexId g + 1)
res <- foldM (go bs) [] roots
return $ reverse res
where
go bs acc v = do
isMarked <- testBit bs (vertexId v)
isMarked <- testBitUnsafe bs (vertexId v)
case isMarked of
True -> return acc
False -> do
setBit bs (vertexId v)
setBitUnsafe bs (vertexId v)
nxt <- filterM (notVisited bs) (nextVerts v)
foldM (go bs) (f v : acc) nxt

notVisited :: BitSet s -> Vertex -> ST s Bool
notVisited bs v = liftM not (testBit bs (vertexId v))
notVisited bs v = liftM not (testBitUnsafe bs (vertexId v))

-- | Forward parameterized DFS
dfsWith :: (Graph g)
Expand Down Expand Up @@ -130,17 +133,20 @@ xdffWith :: (Graph g)
-> [Tree c]
xdffWith g nextVerts f roots
| isEmpty g || null roots = []
| otherwise = runST $ do
bs <- newBitSet (maxVertexId g + 1)
res <- foldM (go bs) [] roots
return $ reverse res
| otherwise =
if any (not . (`elem` vertices g)) roots
then []
else runST $ do
bs <- newBitSet (maxVertexId g + 1)
res <- foldM (go bs) [] roots
return $ reverse res
where
go bs acc v = do
isMarked <- testBit bs (vertexId v)
isMarked <- testBitUnsafe bs (vertexId v)
case isMarked of
True -> return acc
False -> do
setBit bs (vertexId v)
setBitUnsafe bs (vertexId v)
nxt <- filterM (notVisited bs) (nextVerts v)
ts <- foldM (go bs) [] nxt
return $ T.Node (f v) (reverse ts) : acc
Expand Down
42 changes: 34 additions & 8 deletions src/Data/Graph/Haggle/Internal/BitSet.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ module Data.Graph.Haggle.Internal.BitSet (
BitSet,
newBitSet,
setBit,
testBit
testBit,
setBitUnsafe,
testBitUnsafe
) where

import Control.Monad.ST
Expand All @@ -11,6 +13,21 @@ import Data.Vector.Unboxed.Mutable ( STVector )
import qualified Data.Vector.Unboxed.Mutable as V
import Data.Word ( Word64 )

-- Note that the implementation here assumes thaththe bit numbers are all
-- unsigned. A proper implementation would perhaps use 'Natural' instead of
-- 'Int', but that would require gratuitous fromEnum/toEnum conversions from all
-- the other API's that just use 'Int', which has about a 33% performance impact
-- when measured.
--
-- The 'setBit' and 'testBit' operations use V.unsafeRead instead of V.read
-- (where the latter is roughly 25% slower) because this is an internal module
-- that is generally always used with a positive 'Int' value, and the value is
-- also checked against 'sz' (which is also probably superfluous). In other
-- words, this module prioritizes performance over robustness and should only be
-- used when the caller can guarantee positive Int values and otherwise good
-- behavior.


data BitSet s = BS (STVector s Word64) {-# UNPACK #-} !Int

bitsPerWord :: Int
Expand All @@ -28,22 +45,31 @@ newBitSet n = do

-- | Set a bit in the bitset. Out of range has no effect.
setBit :: BitSet s -> Int -> ST s ()
setBit (BS v sz) bitIx
setBit b@(BS _ sz) bitIx
| bitIx >= sz = return ()
| otherwise = do
| bitIx < 0 = return ()
| otherwise = setBitUnsafe b bitIx

-- | Set a bit in the bitset. The specified bit must be in range.
setBitUnsafe :: BitSet s -> Int -> ST s ()
setBitUnsafe (BS v _) bitIx = do
let wordIx = bitIx `div` bitsPerWord
bitPos = bitIx `mod` bitsPerWord
oldWord <- V.read v wordIx
oldWord <- V.unsafeRead v wordIx
let newWord = B.setBit oldWord bitPos
V.write v wordIx newWord

-- | Return True if the bit is set. Out of range will return False.
testBit :: BitSet s -> Int -> ST s Bool
testBit (BS v sz) bitIx
testBit b@(BS _ sz) bitIx
| bitIx >= sz = return False
| otherwise = do
| bitIx < 0 = return False
| otherwise = testBitUnsafe b bitIx

-- | Return True if the bit is set. The specified bit must be in range.
testBitUnsafe :: BitSet s -> Int -> ST s Bool
testBitUnsafe (BS v _) bitIx = do
let wordIx = bitIx `div` bitsPerWord
bitPos = bitIx `mod` bitsPerWord
w <- V.read v wordIx
w <- V.unsafeRead v wordIx
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would also need to check that bitIx >= 0 for this to be safe-ish. If you pass in a negative index, this could be very bad. Perhaps the bug is that the API takes an index using a signed type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am saddened that so much of the prelude and containers uses Int instead of Natural. Any index, the length, the initial size, etc. should all be Natural but are all Ints. I've used Natural in other places, but it does add some noise because there's a need to convert between Natural and Int in [lots of places to impedance match with the stuff that is written to use Int.

I think this is actually safe in practice because this is an internal module that is not exported to users, and it's only used from Algorithms.DFS as the latter iterates through graph elements (practically speaking, even the bigIx >= size should be unnecessary based on the usage).

I considered switching this interface to Nat, but it's using the vertexId, so either DFS is calling toEnum on all these so that BitSet can almost immediately call fromEnum to invoke the fromEnum to call the Vector operations, or the Vertex type changes to a Natural (and although it's supposed to be internal, it's visible and a much bigger change, and it still needs the fromEnum to call the Vector operations).

In fact, I switched the other read in this file to unsafe, then I benchmarked, the original safe, the unsafe in this PR, and a unsafe Nat version with the fromEnum/toEnum I discussed above. I'll email the criterion HTML separately. tl;dr Nat+unsafe is 33% slower than unsafe, and safe is 25% slower than unsafe.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten that this was an internal-only module.

In that case, what if we just renamed this to unsafeSetBit and unsafeTestBit and even removed the bounds check?

I was going to say that the only uses do their own checking, but that is not entirely true. If you mix vertices from another graph, you could hit an error (and we don't have any type-level enforcement of that). It looks like it would be safe if we just filtered the root [Vertex] inputs in the DFS module.

return $ B.testBit w bitPos

Loading