What is stream
A stream is not a collection , sequence , or a stream of objects
A stream is an abstraction that holds zero or more values
Not (necessarily) a collection : values might not be stored anywhere
Not (necessarily) a sequence : order might not matter
Values ,not objects : avoids mutation and side effects
Pipelines
A stream source
Zero of more intermediate operations
A terminal operation
Collection.strteam()
.filter(…)
.map(…)
.collect(…);
Parallel Streams
Sources start with stream(),parallelStream() or other stream factory
Can be switched using parallel and sequential stream
Parallel vs sequential is a property of the entire pipeline
Can’t switch between parallel and sequential in the middle
Last one wins
Parallel makes it auto-magically go faster?
NO
collection().stream()
.filter(...)
.parallel()
.map(...)
.sequential()
.collect(...)
entire stream runs sequentially
Parallel stream considerations
Parallel and sequential stream should give same result
Parallelism leads to non determinism which is bad
Encounter order vs processing order
Statefulvs stateless : side effects
Accumulation vs Reduction
Reduction : Identity and associativity
Explicit nondeterminism can speed things up
Parallel has a overload , might also slow up things
Parallel vs sequential
Source op1 op2 terminal op
op1 op2
Source op1 op2
op1 op2 terminal op
op1 op2
Encounter order vs processing order
The ordering of the source determines the ordering in the result
Processing order is non deterministic
Accumulation vs Reduction
long sum = 0L;
for (long i = 0; i <= 1_000_000L; i++) {
sum += i;
}
Identity Value
The starting value of each partition in parallel stream
Becomes the result if the stream is empty
The values must be correct
must really a VALUE(immutable)
Associativity
Reduction operation must be associative in parallel stream
Where are threads
Stream workload split and dispatched to the common-fork-join pool
Control over concurrency is explicitly opaque in the api
Common pool controlled by system properties
When go parallel
Parallel stream has startup overload
Typically 1000 misroseconds
If you computation is shorter , do not even bother
Consider parallel if N * Q >= 10,000
N = number of elements
Q = cost per element
Assumptions
Element processing is idependentand source is spliatble