A little while ago (or at least it was recently when I started planning this post out), the commentator Matthew posted a number of interesting comments which were certainly worthy of a response with a blog post rather than an inline reply. I responded to the first two comments a couple of posts ago. That post, however, grew too long both in length and writing time, and I decided to defer my discussion of the third comment to a subsequent post. Which is this one. I note that Matthew has responded to my original post; I'll probably get back to that in a few posts time.
The comment in question is this one, and I recommend that readers look at it before coming back here.
Before I begin, I should also warn any readers that my views on the topic are still developing (partly because responses such as Matthew's force me to rethink things, and consider the topic in more depth), and what I write here might not be fully consistent with what I have written before or elsewhere. I also ought to begin by outlining the underlying science, and the noncontroversial aspects of the problem.
The idea behind Bell's theorem is straightforward. I simplify a bit here; there are two more precise explanations of Bell's theorem (from different points of view) described below. You have two particles with an entangled quantum state; that is to say the states of the two particles are in some way linked together. When one measures a particular observable for one particle, then a measurement for the other particle must give a complementary result. This is true no matter how far apart the particles are. If the result of the measurement is determined at the time of the measurement (as held by, for example, the Copenhagen interpretation), then there must be instantaneous communication between the two experiments, so the particle at the second detector knows what answer it has to give to be consistent with the result at the first. This instantaneous communication seems to violate the principles behind special relativity. The second requirement to derive Bell's inequality or something like it, is that there should be several observables which are correlated but the operators representing those observables don't commute with each other. That means that if you are in an exact state for one observable, you are in an indeterminate state for the others. This is a feature unique to quantum physics; it is not to be found in either classical (Newtonian) physics or in Greek physics. This second requirement seems to imply either that results of measurement are only determined at the point of measurement, or there are some underlying physical parameters, not directly observable, but which determine the possible outcomes of all future measurements and which are set at the moment when the two particles depart from each other. What Bell's theorem attempts to show is that these hidden variable theories necessarily imply results which contradict experiment. This leaves us with the first option: that the outcomes of at least some measurements is only determined at the point of measurement. This seems to imply that the axioms that either quantum physics or special relativity depend on are wrong (or at least incomplete), which is a problem because these are the two pillars of contemporary physics, and two ideas which we are more certain than anything else are right.
I usually use the example of a spin 0 Boson decaying into two spin 1/2 Fermions. The phenomenon is also present in polarized light (and many of the experimental tests of Bell's and related inequalities have used light), but the important aspects of the underlying mathematics are the same in each case, so from a theoretical standpoint it doesn't matter too much which physical system is modelled. The case of particle decay is perhaps slightly simpler to grasp.
A spin 1/2 particle's spin can be measured along any particular axis, and in each case it can take two possible values: +1/2 or 1/2. The spin 0 particle can only have its spin measured as 0. Because spin is conserved along each axis in every physical interaction, if one of the two Fermions is measured as spin 1/2 along any axis, then the other one must have spin 1/2. And this is true in each direction. And this holds in both classical and quantum physics; there is no quantum spookiness coming in yet. (OK, the concept of spin itself only arises naturally in relativistic quantum physics, but apart from that there is no quantum spookiness coming in as yet.)
But what if we measure the spins of the two Fermions along slightly different axes? Clearly, in this case, we need not get opposite results. We could measure the spin of both particles as being +1/2. But we might also expect that the spin states are in some way correlated for directions that are close to each other. I don't see that this necessarily has to be the case. One can think of a classical system where the spin in each direction is fully determined at the point of decay independently of what is happening in any other direction. But in practice, there is correlation between the observed value of the spins along neighbouring axis. If the spin of one of the Fermions is 1/2, the spin of the other one, if measured along an axis which is only slightly differently aligned, will be +1/2 almost all the time. And this is fortunate: it would be otherwise pretty much impossible to ensure that the two detectors were aligned in precisely the same way, so one would always get random results. (As noted, the concept of spin arises from relativistic quantum mechanics, and that is what we theoretically expect.) So if we say that the two axes of measurement differ by an angle θ, then the chance that they are misaligned will be some function of θ. And that works in either classical or quantum physics.
But the mathematical details of how the spins will differ depend on the model used to calculate them, and in particular the assumptions behind those models. Some of these agree with the quantum calculation, and others don't.
The Copenhagen interpretation of quantum physics states that the state of the particle is indeterminate until it is measured. Since the two particles are linked together, as soon as you perform a measurement on one of them, it collapses into a well defined state (that's standard). But in that case, so does the entangled partner. This dual collapse happens simultaneously. This is a problem in relativistic theories where there is no clear notion of simultaneity for two events at different locations.
The standard sorts of interactions in QFT are all local. They involve particle decay, emission and absorption, and each event occurs at a single point. This is built into the theory; it cannot be removed without violating Lorentz symmetry. Wavefunction collapse of entangled particles, on the other hand, is nonlocal. That means that either the information communicated in wavefunction collapse occurs via some nonphysical interaction, or the model of wavefunction collapse is incorrect.
Bell's theorem in particular targets interpretations of Quantum Physics which rely on hidden variables. These basically say that all the different possible experimental outcomes are encoded in some way in the particles at the time of their decay. Behind the wavefunction are another set of variables which control the experimental outcomes. The theorem states that if there are hidden variables and locality, then one gets different results than expected from the quantum mechanical calculation.
The assumptions used to derive Bell's inequalities, as stated in this paper are:
 The outcome of a measurement is not determined at the point of measurement, but is uniquely determined by the properties of the particle before the measurement is performed. (This assumption is known as local realism or the hidden variables assumption.)
 Reality is single valued, i.e. each measurement has only one outcome. This is violated in Everett multiple world interpretations of quantum physics.
 The microstates (or hidden variable states) are determined by an underlying probability distribution, which is positive and normalizable to one. This could be violated if the detectors are inefficient (i.e. there is bias in which of the particles are detected), or if the detectors were in some way dependent on the hidden variables in the particles.
 There is no backwards causation; i.e. causal signals can't go backwards in time.
 Factorizability (or locality), i.e. what is happening at one detector is independent on what is happening at the other detector. The measurement results just depend on the hidden variables of the particles, and not on what happened at the other detector. This entails both that the settings on the two detectors don't interfere with each other, and that the outcomes of the two experiments don't depend on each other, but only on the hidden variables. This arises from the principle that only events in the past light cone of the measurement can influence the measurement. The probability of getting a particular value at one detector is conditional only on the events in the past light cone, and otherwise independent of what is happening at the other detector.
Some of these axioms are related to each other. For example, the article that Matthew referenced in his comment, suggests that there are only three assumptions: factorizability, no backwards causation, and hidden variables.
One other thing we should bear in mind is that the experimental tests of Bell's theorem are not performed using a single particle decay. They require numerous events to build up an ensemble. One calculates a probability, which one wishes to compare against a frequency distribution. But to reliably measure a frequency distribution, one needs a large enough sample of events. Each individual decay (unless the detectors are orientated along the same axis) might either have the spins in alignment or not. There is no way to predict the outcome, and the result by itself doesn't prove anything. It is only when we consider the ensemble after numerous decays that a frequency distribution emerges. The probability distributions we calculate are predictions of the frequency distributions for an ensemble of results. They do not describe what is happening with a single particle. Hidden variables, on the other hand, refer to the mechanics of each individual decay. If there are such hidden variables, then each individual event will depend on its own set of them. The ensemble distribution itself represents some sort of average over all the possible hidden variables.
Before progressing, I need to discuss the mathematics. Warning in advance for any nonmathematicians: this will get pretty intense. It should be understandable for anyone with Alevel or IB mathematics or equivalent.
The mathematics  derivation of Bell's inequality
First a bit of notation. We have a physical system which consists of a spin 0 particle which emits two spinhalf Fermions in opposite directions (Fermion 1 and Fermion 2). We have detectors x, w, a and b which measure the spin of the two Fermions. w and a are positioned to observe Fermion 1, while x and b are set up to measure the spin of Fermion 2. We can choose to use either detector w or a, and similarly choose whether to use x and b. x and w are set up to measure the spin of the Fermions along the same axis. So w will always give the opposite result to x. a and b measure the spin along their own axis. All three of these axes are in the same plane. The angle of the spin axis measured by a and that measured by w is . The similar angle between x and b is .
 A represents detector a giving a spin up result.
 A' represents detector a giving a spin down result.
 B represents detector b giving a spin up result.
 B' represents detector b giving a spin down result.
 X represents detector x giving a spin up result (and consequently w reports that Fermion 1 is spin down).
 X' represents detector x giving a spin down result (and consequently w reports that Fermion 1 is spin up).
We are interested in calculating the probabilities. Every probability is conditional on its premises, and those we calculate here will be conditional on λ, which represent the assumed hidden variables, and M, which contains information on the initial setup, the assumptions of the theoretical model, and so on. I will denote "the probability of A conditional on M" as P(AM). represents the logical and or set intersection operator, while represents the logical or or set union operator.
We will need a few standard results from probability theory. Firstly, the probability of an outcome intersecting with the universal set is the probability of the outcome by itself:
Secondly, the probability of an outcome intersecting with its compliment is 0 (the law of noncontradiction):
Thirdly, one of the axioms of probability, the rule for the probability of a union of two outcomes.
Finally, that the probability that outcome A or outcome B occurs is greater or equal to the probability that outcome B alone occurs
What we want to calculate is the probability that the detectors at a and b give the same result, conditional on the results we get when we perform the experiments at a and x, and when we compare b and w. Those initial results are found to be:
The factor of comes from the probability that we can either get the result X or X' given the initial experimental setup. For these equations to be useful in modelling reality, we need the assumption that the observational results depend solely on the hidden variables (plus experimental setup). Recall, we will eventually want to compare this probability with an experimentally observed frequency. That comparison only makes sense if the causality represented in the calculation of the probability exactly matches the real physical causes. We also use the assumption that the results of one measurement don't influence the other measurement throughout this calculation.
So the thing we want to calculate is
So let's start trundling through the algebra.
Since
we can kill the last term in the expansion, and continue
We now make use of equations (1) and (2) to write
Repeating this for each of the terms in equation (8) gives
And substituting the numbers in from equations (3) and (4) gives
And this is the result we are after: Bell's inequality, as it applies to this particular experimental set up. If this inequality is shown to be violated in practice (which it is), then that shows that at least one of the assumptions behind the calculation is false.
The mathematics  the quantum calculation
So now it is time to look at the quantum mechanical calculation (which, of course, agrees with experiment).
So once again, we start with a spin 0 particle decaying into two spin half Fermions. We want to calculate the same probability as above. I will only do the calculation for detectors a and b  one can use the same methods to calculate the comparisons between a and x, for example, which will lead to the same results in equations (3) and (4).
I will denote the Fermions emerging from the decay as Y_{A}, which heads towards detector a, and Y_{B}, which heads towards detector b. We need to express the possible observable values in terms of quantum states, so will represent the state corresponding to detector a measuring spin up, and so on.
We don't know whether Y_{A} is spin up or spin down, so we need to account for both possibilities in our amplitude. The amplitude that we destroy the spin up particle at detector b will be given by the overlap between the states, represented by . So what we are doing in our calculation is creating the particles one spin down particle and one spin up particle, and then destroying them at detectors a and b. In terms of creation and annihilation operators, this is
To create the amplitude , we need to annihilate the creation operator for Y^{+}_{B} with the annihilation operator for B, which means first of all we need to place these operators next to each other. We can reorder the Fermion operators by swapping neighbouring operators, until they are in the order we want. However, Fermion operators anticommute, so whenever we swap two neighbouring operators, we have to multiply the whole expression by minus one. That is the origin of the minus sign in the right hand side of equation (12).
I'll use Q this time to denote the initial conditions of the experiment and modelling assumptions (since it is a different model to the classical case, I'll use a different symbol). So what we want to calculate are amplitudes such as
However, in the calculation below, I will use the Y states as an intermediary. But we don't know the spin states of Y, so we have to integrate over all of the possibilities:
Y_{A} is equally likely to be spin up or spin down, so we have one of the two amplitudes needed to solve this integral immediately.
I will spend the rest of this section calculating , the amplitude that a and b are both spin up given that Y_{B} is spin up and Y_{A} is spin down, and the other similar quantities I need.
I need to start by defining the spin operator, and calculating its eigenstates. I will use polar coordinates to convert (x,y,z) Cartesian coordinates into a pair of angles.
Then the matrix representation of the spin operator is
The spin up and spin down states for arbitrary angles are defined by the eigenvalue equations
These turn out to be,
We have a degree of freedom in deciding what the angles our detectors are orientated at. The system has rotational symmetry, so we can pick the xaxis to be in whatever direction we choose. Only relative differences between angles will affect the final result. As such, I am free to pick detector A as being along the xaxis, and detector B as resting in the xy plane. The Fermions themselves can emerge in any spin orientation. Thus we have,
This means that the states are,
We can now calculate the amplitudes. Firstly, the amplitude that both detectors measure the particles to be spin up. Don't forget that we need to complex conjugate the states for A and B.
Similarly, we can calculate the amplitude that detector a records spin down and detector b spin up
And the other possible measurement outcomes give
Note that these results don't depend on the orientation of the Fermions as they emerged from the decay. We have no way of knowing what that was. We have all the numbers we need, so we can now calculate the final amplitudes
Then we can modulus square the amplitudes (Born's rule) and combine them together to get the probabilities that the two detectors will record either the same spin state or opposite spin states,
Converting this back to the notation of the previous section gives
And this is the quantum mechanical prediction.It obviously differs from the classical result (equation (11)), and can violate the inequality.
Models and explanations
I designed the calculation in the previous section to reflect my own philosophy. There are other ways of getting the same result, which would reflect other philosophies of physics. But my approach is to take the calculation above literally.
In summary, my approach is:
 God exists and actively and continuously sustains the universe. Physics is the description of that action (in the absence of any special circumstances, where God is free to act differently, leading to miracles). Ultimately, the various symmetry laws that constrain the laws of physics are a reflection of God's attributes. The indeterminacy of physics is due to God's free will.
 Substances can exist in one of a number of states (or potentia). One of these is actual at any given time, and changes either represent movement from one state to another, or the creation or annihilation of different particles. The states are always expressed in terms of a basis. There might be (for example) two allowed states in a given basis. There is a different basis for every possible observable. However (and this is where it gets harder to visualise and explain without the mathematics), unlike Newtonian or Greek physics, these bases are not orthogonal: if one is in an exact state in one basis, then one is necessarily in a superposition of states in another basis.
 I make a distinction between event and substance causality. Substance causality refers to which substance (or particle) preceded another substance. Or, more generally, which actualised state preceded another actualised state. There are a limited number of possible interactions (in the absence of miracles), which are denoted by the final causes of a particle. The efficient cause traces back the past history of a particle, from one state to another. Final causes point to substances which could possibly arise from that particle. Substance causality is always respected, even in quantum physics. It is also local, in the sense that all interactions occur at a single point in space and moment in time, and only substances in the past light cone of a being can be part of the chain of efficient causes for that being. Event causality, on the other hand, refers to the causes of events, which represent changes in states. The physical particles alone are insufficient to describe the event cause. Even if we had a complete knowledge of the state of the universe and all physical parameters, events would, with a few exceptions, be impossible to predict precisely. Ultimately, which events occur, which potentia are actualised, comes down to God's free decisions. It is not something we can predict; the best we can do is treat it stochastically, using the methods of quantum physics. While God is able to do whatever He pleases, in practice God's rational nature and constancy means that there is a degree of regularity in the actions. (God only acts differently if there is a good reason to do so.) Thus we can usefully assign amplitudes to each possible event, and also expect different events to be correlated to each other as a reflection of God's consistency and constancy.
 Uncertainty for a event in quantum physics is best parameterized by an amplitude rather than a probability. We can only convert to a probability after we have a large number of such events and want to compare against an experimental measurement of a frequency distribution.
To apply these principles to this particular example, the spin zero particle decays into two Fermions (an act of efficient causality). The two Fermions will be in an actual state, in a definite spin state (Y) and a particular basis. We don't know what that spin state is (because of God's freedom when actualising the decay), and we have no way of finding out. When we take a measurement, we project the particle into the basis associated with the detector (A or B). The amplitude for us to measure it either as spin up or spin down is given by the overlap between the detector state (A) and the decayed particle state (Y). Usually we will then need to integrate over all unknown variables (such as the precise orientation of Y) to get a final result which can be compared with experiment.
The assumptions behind this calculation are:
 The particle is emitted in a particular state in a particular basis. Only those observables which commute with that basis are determined prior to measurement. The observables that require noncommuting bases are undetermined until measurement.
 Reality is single valued.
 Uncertainty for single particles is most fundamentally parametrized by amplitudes rather than probabilities. If we later wish to convert to a probability to compare predictions for an ensemble of particles with an observed frequency distribution, we can do so via Born's rule.
 The system is factorizable and local, as far as substance causality is concerned.
 The measurement process is indeterminate. It proceeds by projecting the particle into the basis defined by the detector. The amplitudes for the which of the two possible eigenstates it drops into are determined by the overlap between the particle state and the two eigenstates of the detector's basis. The measurement, forcing the particles into one of the two states, is an event, not determined by physical causes alone. The amplitudes correctly predict the likelihood of each possibility.
So I would disagree with the first and third of the assumptions behind the derivation of Bell's inequalities. There are some similarities with the Copenhagen interpretation, because the outcome of a measurement is (usually) not determined until the measurement itself. However, unlike the Copenhagen interpretation, where the particle remains as some ghostly nonphysical wavefunction between measurements, I would maintain that it is in some definite state, we just don't know what that is.
In my interpretation, we don't know, and can't know even after measurement, what the state the particle is in between emission and observation, but we do know that it is in some state. The wavefunction (in the respects that are relevant to this discussion) represents a parametrization of our uncertainty. In particular, it is used to predict the results of experiments (after we have enough statistics to determine a frequency distribution). There is a hidden reality behind it. But this do not lead to Bell's inequality, but instead duplicates the standard quantum mechanical calculation.
Of course, this still leaves one question. How does the detector at a know how to get the opposite result from the detector at b when the two detectors are aligned? The mathematics says that it must, but what lies behind the mathematics? The mathematics is used to make statistical predictions of the results of the indeterminate system. It is ultimately based on symmetry requirements, which in turn are drawn from the premise that whatever lies beyond the laws of physics is not bound by time and space: timeless and omnipresent, and therefore does not treat one moment in time or one location in space as any more or less important than any other. The results of the experiment are indeterminate, but using these symmetries we can still make statements about the likelihood of these results. We know that there is no physical communication between the two detectors  if there were, it would be present in the action of the most fundamental theory (which I assume is the standard model adjusted to take into account gravity). But whatever is behind the laws of physics  whatever is the reason why the particles obey them  is not a physical particle. Now I regard the laws of physics as a description of God's actions in sustaining the universe (via secondary causation), under the assumption that God has no special interest in the events concerned. If, for example, God wanted to miraculously heal a blind man, then He would have a special interest in a particular time and location; the symmetries constraining QFT would in that moment be invalid, and therefore the assumptions behind the prediction would no longer hold. We would get an unexpected result. (The mechanism by which God produces miracles is, in this model, exactly the same mechanism by which God creates and destroys particles in the ordinary course of sustaining the universe.) But, in the case of this particular experiment, there is no good reason why God should favour one result ahead of another. The symmetries hold; the mathematical calculation of the amplitude is a correct description of the likelihood of God's actions, and the amplitude of zero for the detectors recording the same spin means that the two detectors will display the opposite spin.
Does this imperil the freedom of God? No. Firstly, God can still either make the particle at detector a spinup or spindown, so the result is not determined, and we certainly aren't led to the situation where God is forced into one particular act. Secondly, God could record the same spin on both detectors ( a miraculous intervention), but would only do so if He had a reason to do so. And there is no special reason, so we are left with the default result.
People might, of course, dispute that the laws of physics are a description of God's sustaining of the universe. But, in this context, that affects little. What I needed are the attributes of timelessness, omnipresence, and omnipotence (that is universal applicability): and I think even most atheists would admit that the laws of physics either represent the acts of something or somethings that possesses these attributes, or, if they are in some way capable of action themselves rather than as a description or an intermediary, then the laws themselves possess those attributes. There will be those who disagree with this, of course, such as those who hold to a Humean view of causality, or otherwise object to the notice of an objective (i.e. operating beyond humanity) and knowable notion of laws of physics. But such people have far harder things to explain than mere quantum entanglement.
Why do I believe this to be the best interpretation?
Firstly, it takes the mathematics literally, and mirrors what the mathematics describes. In particular, it is consistent with QFT's description of matter and the creation and annihilation of particles. Every object that enters into the mathematical calculation has something corresponding to it in reality. Only those objects in the calculation exist in reality (at least, of those things relevant to the experiment). It preserves both reality (which is important from a philosophical perspective) and locality (which is important if physics is to be self consistent). To be more explicit: all physical interactions (or secondary causation) are local. However, God is not localised.
Granted there is more than one way to formulate the mathematics. For example, I imagine that this is where those who advocate de Broglie/Bohm's interpretation will part from me. In the de Broglie/Bohm, the Schroedinger equation is split into two parts. Likewise reality is split into two parts: there are the particles (which we observe) and an underlying pilot wave (which is not observed, but which controls the motions of the particles). One part of the division of the Schroedinger equation governs the pilot wave evolution. The other governs the motion of the particle. The proponent of this interpretation would argue that it too is the natural interpretation of the mathematics, after de Broglie's reformulation.
The problem is that the equation governing the motion of the particle depends not only on the the wavefunction at that point, but at every other point. This violates the locality of physical interactions. This nonlocality is what allows the de Broglie/Bohm interpretation to avoid the worst implications of Bell's theorem (a denial of realism). However, it creates problems when reconciling the interpretation with relativity. There have, of course, been attempts to do this  see for example arXiv:1307.1714 and arXiv:quantph/0303156.
That last paper, in particular, looks quite similar to my own interpretation: particles travel down their own worldlines, with occasional spontaneous jumps corresponding to a creation/annihilation event. Like me it prefers a particle rather than a field interpretation of QFT, and for much the same reasons. However, this interpretation also requires the physical but unobservable pilot wave to guide the particles. In my interpretation, this is redundant. Between the jumps in the DeBroglie/Bohm interpretation, everything is deterministic, while in my interpretation that is not so. In my interpretation, the Hamiltonian operator allows us to calculate, given an initial state, the amplitude for each possible final state a moment later (we can talk about "moments later" without specifying a preferred reference frame and thus time axis because the Hamiltonian operator is entirely local: it makes no difference how we label space time points elsewhere in the universe). This final state could have the particle move to a neighbouring point in space, or stay where it is, or interact via a creation/annihilation event with one or more other particles. There is an amplitude for each option, which means that every option is possible. There is no deterministic evolution between creation or annihilation events. So even if the DeBroglie/Bohm interpretation can reproduce the standard theoretical results (and I haven't seen an explicit calculation suggesting this), it still requires nonlocal physical interactions, redundant nonobservable physical objects, and unexplained indeterminate jumps in an otherwise deterministic system. My interpretation avoids all of those.
What of the alternatives? The Everett interpretation requires a branching of reality at each moment of indeterminacy. There is no obvious reason why this should happen, or clue as to what mechanism causes it to happen. Nor can there be any experimental evidence for it. The Copenhagen interpretation denies reality between measurements, and also has that unpalatable mix between deterministic wavefunction evolution and indeterminate jumps (in this case on measurements). The ensemble interpretation is widely regarded  including by me  as being disproven by Bell's theorem (and related results). Other interpretations require information passing backwards in time, violating causality. All these other interpretations require adding something in addition to the bare mathematics, and often even after that don't really resolve the problem of mixing determinacy and indeterminacy. You might say I too am adding something beyond the mathematics, namely God. But the theist would respond that God is present behind the scenes in any interpretation of quantum physics, giving the abstract equations force in the real world.
Extensions to Bell's theorem
I have been discussing Bell's original theorem. There have been extensions and variations of this brought up since then. I will cite the LeggettGarg inequality, the KochenSpecker theorem, and the PuseyBarrettRudolph theorem, which Matthew himself raised.
These all assume slightly different assumptions to those of Bell's theorem. However, as far as I can tell, they all rely on the first and third of Bell's assumptions. As such, they are not applicable to my interpretation.
Matthew's Comment
Now onto the comment which sparked this post. And as with Matthew's other comments, it is well thought out, well informed, and challenging. I am very grateful for such comments, since they force me to think about things more deeply. And will correct me on a number of issues.
I hope it is not too presumptuous of me to assert that an expert on quantum physics is wrong about quantum physics... but, I believe you are wrong about this:
"In the derivation of his inequalities, Bell assumed that uncertainty concerning the hidden real substratum of matter should be parametrized using classical probability; assuming that all the predicates of the particle have actual values in the hidden substratum"
The probabilities that appear in the derivation of Bell's theorem are probabilities for measurement outcomes predicted by a candidate physical theory, and they are conditional on measurement settings and whatever else the theory says is required to predict those probabilities. Bell does not actually assume anything about what is required to specify the state of the system; in fact, it could just be the quantum wavefunction. As long as it is coherent to say things like "there is a certain probability for this system to have this certain wavefunction" and "the probability that we get this measurement outcome is soandso, given that the system has this wavefunction," and these probabilities obey classical probability theory, I can't see any objectionable hidden assumption in the derivation of Bell's theorem. (And that is even if I am wrong in my comment on your earlier post about whether classical probability theory can coherently be said to fail for quantum systems, or whether it is more accurate to understand things some other way.)
I would discuss quantum states rather than wavefunctions, but I think what Matthew means is the same thing as my state. A state is a possible configuration of the particle. A wavefunction is a superposition of states, which parametrizes (as an amplitude) our uncertainty about which state the particle is in. So states are physical; wavefunctions are (primarily) epistemic. They are also things we calculate, and as such conditional on whatever premises we put into the calculation. As such, we should not discuss probabilities for a system to have a certain wavefunction, since systems don't have wavefunctions. They exist in particular states.
So is it coherent to say "There is a certain probability for this system to be in this particular state?" as Matthew requires. I would disagree with this. It is not coherent to discuss probabilities that the system is in a given state, for the most part. In my interpretation, our uncertainty concerning quantum states is parametrized in terms of an amplitude rather than a probability. So one can say "There is a certain amplitude for this system to be in this particular state." But one cannot use probabilities here. The reason for this is that when we convert from amplitudes to probabilities, we lose, or rather combine, information that distinguishes between different states. For example, we picture possible states of a system as being the different points along the circumference of a unit circle. In practice, we would have several such circles. Although the system itself must be on one circle or another, we don't know which one, so our amplitude and probabilities distributions are split between them. The amplitude would be a point somewhere within that circle. It contains information on both the angle and the radius. The probability, on the other hand, is the square of the radius, and can only be represented as such because of the definition of probability as a single number lying between zero and one. Thus the probability can't distinguish between the different states on the same circle. The conversion to probability loses the angular information. The best we can talk about is the probability that the particle is in one of a large number of states (namely all those which contribute to the same circle); the amplitude can distinguish between the individual states on the circle.
What if one did try to use something other than Born's rule to convert from an amplitude to a probability. Then you could maybe assign a probability to each individual state. But that would lead to incorrect experimental results. Probability theory assumes that the states are orthogonal, and this affects how you try to combine probabilities for different states. But the states of quantum physics are not like this. The problem is that it is precisely the interference effects, or nonorthogonality, between these different possible states  which can be picked up by the amplitude but not the probability, which lead to all the interesting quantum features. (This example is most closely related to interference; each circle represents a different location, and the angles represent a different complex phase; but similar considerations apply to particle spins or photon polarisation.)
There are, of course, caveats to make. Firstly, we could discuss probabilities if no information was lost when converting from the amplitude to the probability. For example, if each amplitude, instead of being represented by a circle, was represented by a line segment from 0 to 1. Here we would have no angular information; there would be a one to one relationship between the amplitude and probability. In this sort of system, it would make sense to associate probabilities with given states. However, I can't think of a quantum particle described by the standard model like this.
The second caveat is, of course, the measurement process itself. Here the system is forced by the detector into a given basis (or a given angle). There are now no interference effects between the amplitudes for different states, and again there is a one to one relationship between the amplitude and probability. But this only occurs at the point of measurement. Before it hits the detector, the particle could be in any basis, and therefore we have to parametrise our uncertainty using the amplitude. It is incoherent to say there is a probability that the particle is in a given state, because probability can't specify which basis the particles are in, and can't deal with nonorthogonal bases.
So as long as Bell assigns a probability to a quantum state (or hidden variable configuration) outside the measurement process, my criticism is valid. One doesn't have to read very far into his original paper to find where he does this:
Travis Norsen on Bell
Back to Matthew.
I take my understanding of the situation here from Travis Norsen's papers on Bell's theorem, such as https://arxiv.org/abs/0707.0401 and others. Norsen's work on Bell and Bell's theorem are really good, I highly recommend them to any physicist. I think he does a very good job of clearly presenting where Bell was coming from and what his theorem entails.
First of all, I would like to thank Matthew for pointing me to this paper. It is well worth a read. I'll firstly quickly summarise the contents, and then give my thoughts.
The main argument of the original EPR paper was to show that the theory behind QM, under the assumption that it satisfied local causality, is incomplete. Einstein and Bell each believed that a local hidden variable theory was the only hope for a locally causal quantum physics. Bell's contribution was to show that no locally causal theory could reproduce the correct empirical predictions.
Key to all this is Bell's notion of local causality, which is frequently misunderstood. In summary (I will be more precise later), Bell's definition means that an event can only be caused by events and substances in the past light cone. Many commentators state that Bell's theorem depends on local causality plus some other assumptions, but in practice everything that Bell needs is the assumption of local causality and those things which can be derived from it.
Central to Bell's formulation is the concept of the beables. These are those elements of a theory which are supposed to correspond to something physically real and independent of any observation. There are two separate questions: what are the beables in a particular interpretation of quantum physics? And what are the beables in reality? The first of these questions should be reasonably easy to answer; the second is harder. Bell is only concerned with the first question. He is interested in the issue of those interpretations which only contain beables which satisfy local causality. His goal is to show that such interpretations are inconsistent with quantum physics.
Bell's other concern is that terms should be defined in a rigorous way. For example, those interpretations which separate the microscopic (quantum) world from the macroscopic (classical) world are problematic because there is no clear dividing line between the microscopic and macroscopic. Beables should therefore be well defined in any interpretation.
Equally, beables should not be things which are only a matter of convention. For example, the mathematical form of the electromagnetic potential is gauge dependent. Thus this depends on one's choice of convention, and does not qualify as a beable. However, the existence of the photon itself is not gaugedependent, and this could qualify (in some interpretations) as an objective physical fact, a beable.
Beables certainly exist. Our experimental equipment (and ourselves) have to be assumed to be physically real in any interpretation.
The next aspect of a locally causal theory is that it should be complete, i.e. every relevant factor is expressed in terms of beables in the past light cone of the event. For example, in the case of the Boson decaying into two Fermions, there is the assumption that the decay and its products are the only relevant factors influencing the later experimental outcomes. There is no spooky stuff which we don't detect and haven't included in the model happening alongside it. In particular, everything that is physically real needs to be accounted for in the theory. What is and isn't specified is again provided in a particular candidate interpretation.
The notion of cause and effect are difficult to define in a way that is sufficiently clean for mathematics. Bell's formulation of local causality, however, does not depend on any particular model of causality. It does not invoke any commitment to what physically exists and how it acts. The burden of explaining causality is shifted to the theoretical models and interpretations. Causality is more readily accounted for in a model than in reality. Thus we are interested in causality as it exists in a given model more than causality as it exists in reality. (The purpose of this whole study is to judge between different models.) The claim of Bell's theorem is that all candidate theories which respect local causality are inconsistent with experiment.
The notion of causality used by Bell is not intended to necessarily imply a determinative cause. It applies to both deterministic and stochastic theories.
So now we are ready for the formal definition of local causality:
A theory will be said to be locally causal if the probabilities attached to values of local beables in a spacetime region 1 are unaltered by specification of values of local beables in a spacelike separated region 2, when what happens in the backward light cone of 1 is already sufficiently specified, for example by a full specification of local beables in a spacetime region 3. In a (local) stochastic theory, however, even a complete specification of relevant beables in the past (e.g., those in region 3 of Figure 2) may not determine the realized value of the beable in question (in region 1). Rather, the theory specifies only probabilities for the various possible values that might be realized for that beable.
This might be expressed more formally as a probability equation.
where b_{1} represents one of the two events of interest, b_{2} the other, and B_{3} all the beables in the past light cone of b_{1}. Implications of this include factorizability; namely that the joint probability for A and B can be factorised into a probability for A and a probability for B.
The probabilities are not subjective representing someone's beliefs, but are seen as the fundamental output of some candidate (stochastic) physical theory. In other words, you take one particular model, put in the initial conditions, and the result of the calculation would be expressed as a probability. This is similar to how I would interpret probability.
Bell's formulation distinguishes between causation and correlation. His definition of local causality forbids faster than light causal influences, but may still entail correlations between spacelike separated events. (For example, event 1, space like separated from event 2, might not cause event 2 or be an effect of event 2, but still be correlated because they are both caused by something in each of their past light cones.) Any physical signalling process must involve causation.
The original EPR paper argued for the incompleteness of QM, and that it needs to be buttressed by an underlying locally causal theory. It attempts to show that completeness implies nonlocality, while locality implies incompleteness. For example, to quote directly from the paper.
Thus, noting that for orthodox QM λ in Figure 4 is simply the quantum mechanical wave function, we have for example that but also that in violation of the probability equation above. Orthodox QM is not a locally causal theory.
Thus a locally causal explanation for the correlation predicted by QM requires a theory with more beables than just the quantum wavefunction.
Local causality entails the factorisation of the joint probability for the outcomes once λ is specified. Since there are only two possible outcomes, each of these possibilities entails that the opposite outcome is predetermined. The possible values of λ must therefore fall into two mutually exclusive and exhaustive categories. Since the measurement axis is arbitrarily chosen, the same argument will establish that λ must encode predetermined outcomes for all possible measurement directions. Thus local causality requires deterministic hidden variable theories.
The paper goes on to derive the CHSH inequality to show local causality's violation with QM. I need not discuss the details of that derivation here. This calculation assumes local causality as defined above, and that the setting of the detectors is independent of the hidden variables λ.
Are there any ways around this? The article closes by suggesting two. Firstly, nonMarkovian causal influences(i.e. when an event at time t2 influences events at time t directly, rather than indirectly via events at time t1). This I take as just as unacceptable for the one who accepts locality as action at a distance. Ultimately, Lorentz invariance, as it manifests itself in the standard model Hamiltonian, requires locality in time just as much as it does locality in space. The second solution offered by the paper is one where the beables themselves are not localised but spread out over a region of space. An example of this is the pilot wave of the de Broglie/Bohm interpretation.
Response
As mentioned this paper is a good read, and informative. There are two points I want to make in response.
The first point is one which I have made before. It assumes that the output of the model is expressed as a probability. A probability  and this is the definition that both this paper and I use  is a model dependent expression of uncertainty that satisfies Kolmogorov's axioms, and makes certain assumptions about the possible outcomes of system being studied. An amplitude is a model dependent expression of uncertainty that satisfies other axioms and makes different assumptions about the nature of the physical system. In particular, the various arguments that one uses to get from the premise of a model that satisfies local causality to various conclusions that contradict experiment makes use of the axioms of probability. Now, of course, in both causes we are finally interested in comparing against a frequency distribution: directly in the case of a probability, or indirectly via Born's rule in the case of amplitudes. But the various models inspired by the different interpretations discuss single quantum events. Hidden variables (or beables) apply to single quantum events. To get from the theory for a single event to a frequency distribution to compare against an experiment we need to combine those events into an ensemble. There are two different ways in which this combination happens. On the experimental side, we simply rerun the experiment numerous times. On the theoretical side (to deduce a result which can be compared against experiment), the process of going from a single system to a frequency distribution will involve some sort of averaging process (or integration over various states; those states being parametrized by the beables); you assume that every possible combination of beables is sampled at some point in the ensemble. The final result of that averaging process is thus independent of the precise values of any beables for individual events. It just depends on the initial conditions, the model derived from the interpretation, and the precise mechanics of how you do the averaging. The probability or amplitude will no longer depend on the beable data for individual events. And the way that process of combination works depends on whether your uncertainty is parametrized by a probability or an amplitude.
So we have two ways of thinking about this:

Model of single event →
Amplitude conditional on initial conditions and beables→
Born's rule to obtain event probability →
Combine to get result for ensemble (average over beables) →
Probability for ensemble →
Comparison against experimentally measured frequency.

Model of single event →
Amplitude conditional on initial conditions and beables →
Combine to get result for ensemble (average over beables) →
Born's rule to obtain ensemble probability →
Probability for ensemble →
Comparison against experimentally measured frequency.
These two approaches yield different results. The second one is used in quantum calculations. But the equation above defining local causality is both expressed as a probability and contains beable data for an individual event. The only one of these two paths where it makes sense to discuss probability and beables in the same equation is the first route, where we convert to a probability before averaging over the beable information to get the result for the ensemble. In the second route, probability only comes into it after we have integrated over all the beable data; so the probability cannot be conditional upon the actual values of the beables. So Bell's formulation does not match what is done in the actual quantum mechanical calculations. This is why I keep making the point that if the interpretation leads to a model where outcomes are expressed as amplitudes then the mathematical proof of Bell's theorem and the related theorems is no longer applicable.
Of course, that doesn't mean that Bell's conclusion is invalid for those models which parametrise individual events as amplitudes. But a different proof and mathematical definition of local causality is required. I haven't seen such a proof. That's not to say there isn't one (reality doesn't depend on my ignorance, and I would gladly accept correction if I am wrong), but all Belltype arguments that I recall seeing make this same mistake. They don't apply to hidden variable theorems which exclusively use amplitudes to parametrise the uncertainty of single events.
So onto my second observation. This is not one I have made before, or seen anyone else make (but, again, my ignorance of the literature on this topic is vast, so I would gladly accept correction). I am certainly not alone in emphasising that there have been, over the millennia, different definitions and interpretations of causality. In particular, I like to distinguish between substance causality (what substance did this substance emerge from?) and event causality (what was the cause of this event?). I have suggested that modern philosophy almost exclusively focuses on event causality (out of the two of these options; obviously other interpretations such as Hume's are also available); while when classical philosophers discuss efficient or final causality they generally refer to substance causality. It is often argued that the indeterminism of quantum physics is in conflict with causality. Some events, such as radioactive decay, happen spontaneously and seemingly without cause. I have responded that this objection just refers to event causality. Substance causality is respected by quantum physics. This is guaranteed by the conservation of (the quantum mechanical definition of momentum), which arises from the locality of the QFT Hamiltonian, which in turn is mandated by Lorentz invariance and special relativity.
But it strikes me that Bell's definition of local causality, as expressed in Norsen's paper, makes use of event causality. He is interested in the cause of events. This is evident in the probability equation; he is discussing the probability of an event given certain factors; those factors parametrise the causes of the events.
So when I discuss causality, I almost exclusively focus on substance causality. This is what the standard model Hamiltonian, expressed in terms of its creation and annihilation operators, implies and focuses on. It describes possible decay channels: what substances a substance can decay into. When we can use perturbation theory (and with the usual caveats about renormalization etc.) each Feynman diagram represents one possible path to get from an initial to final state, and display the possible processes of annihilation and emission. We don't know which sequence of events happen, but we can (if we are told which path happened in practice) trace out a sequence of substances. Special relativity, via Lorentz invariance, enters contemporary physics by constraining the possible forms of the Hamiltonian. In other words, the rules of special relativity, including no contact outside the light cone, only need apply to substance causality.
If there is event causality, then it is in part nonmaterial. That is to say that material substances are not in themselves sufficient to explain any events. At least, if we are to accept that the fundamental physical substances are either those described by the standard model, or others of a similar nature. There are three ways I can think of around this. The first is to say that events are just uncaused. The second is to say that there is something physical of a different type to those described by the Hamiltonian which drives the events (such as a pilot wave). The third is to the missing data needed to specify the event causes is nonphysical. That is to say something immaterial, or which can't be be expressed by our usual methods of mathematical representation: something which exists outside time and space.
The first of these approaches is troubling, since it would imply that there is, at the heart of the way the universe works, an irrationality. A place where some conclusions don't have premises; where logic breaks down. But this isn't the troubling part. A wholly irrational universe I can imagine. But what we would have is a universe with a rational substance causality, and a partially rational event causality. Sometimes events are determined by their physical causes, such as in an absorption event. Always the possible events are constrained by what is allowed by the Hamiltonian. It is this mixture of rationality and irrationality which would be hard to grasp.
The third approach is to propose that in addition to the material substances described by physics, there is an immaterial substance, a timeless, spaceless, omnipotent and free agent which is directly involved in every process of determining which potential to actualise, while respecting the final causes of substances (unless He has a good reason not to). I call this God; you can call it the source of physical law if you prefer. This substance is in some sense simultaneously in contact with every point in the universe. We can then have a breakdown in local event causality while respecting local substance causality. This is the route that I take.
The second approach is in some respects similar. It proposes that the missing element in the event causes is something physical. For example, a pilot wave. But here we run into the problem that why this thing isn't represented in our most fundamental theory. If it is physical and interacts with particles (which it must do if it influences events), then we should be able to express it mathematically, and put it into the theory. This would require a rewriting of QFT, and in particular the full standard model, incorporating this new feature. The problem of combining Bohm's approach with QFT is well known. The proposed approaches I know of which might be viable (again, this might just be a reflection of my ignorance) still require stochastic jumps for each creation annihilation event. In other words, they don't answer the problem of event causality, and ultimately must collapse into either the first or third approaches.
Conclusion
Entanglement is one of the major issues in interpreting quantum physics. I make no claims to have the definitive answer myself, and I doubt that anyone else can do so either. Bell's work (and others similar to it) is an important contribution. But we need to be careful. I still maintain that Bell's argument makes a mistake in treating the hidden variables in terms of a probability rather than an amplitude. It also fails to make the (in my view important) distinction between event and substance causality.
The question is primarily over the completeness of quantum physics. When we make the distinction between event and substance causality, a picture emerges. If we only focus on substance causality, then quantum physics is complete. Lorentz invariance and special relativity hold. Since Bell's notion of local causality specifically refers to event causality, he does not refute this. With regards to event causality, quantum physics is incomplete. Bell's argument comes into play here, and is an important piece of evidence. But we don't need Bell's argument to make this conclusion. The very stochastic nature of quantum physics alerts us to it. Bell does demonstrate that no classical deterministic theory, mediated by particle or field interactions, could underlie quantum physics (without violating special relativity). This, in turn, leads us to look at metaphysical (we might say supernatural, in the sense of beyond nature) event causes.
But I have discussed that many times before, and you all should know where I think that thought leads.
Reader Comments:
Response part 2
 States, Wavefunctions, Superpositions, and Bases 
One thing that I have been confused about since I first encountered your blog is the way you talk about states and superpositions thereof. Part of my confusion, especially in your earlier posts, is that you seem to import language used in quantum mechanics (states, basis, orthogonality) back into your discussion of the axioms of probability or the different metaphysical systems we might adopt. I do not find that to be helpful. Another part of my confusion is in interpreting this language in regards to your mixed onticepistemic view of the quantum state. You have said "the amplitude formulation is powerful enough to both" (i.e. to be both a representation of reality and a representation of our uncertainty about reality). Now, not only does it seem to me that these two jobs should be kept separate, but I am rather bewildered about what it means to mix them together. Some of the things you say in this post bring that confusion back to the forefront of my mind.
For instance, you say "I would discuss quantum states rather than wavefunctions, but I think what Matthew means is the same thing as my state." Well, as I understand things, a wavefunction _is_ a quantum state; more precisely, it is one way of representing the quantum state, which may be more abstractly represented as a vector in a certain Hilbert space. Since quantum states are (represented by) vectors in a Hilbert space, it makes sense to talk about orthogonality or nonorthogonality of quantum states; about superpositions of quantum states; and different bases in which to describe quantum states. The concepts of orthoginality, superpositions, and bases all make reference to the Hilbert space structure, so it is confusing to me when you import that language into a different context (e.g. when discussing the axioms of probability theory).
But now I really start to get confused. You go on to say "A state is a possible configuration of the particle. A wavefunction is a superposition of states, which parameterizes (as an amplitude) our uncertainty about which state the particle is in. So states are physical; wavefunctions are (primarily) epistemic." So you distinguish between states and superpositions of states; one is ontological and the other is epistemic. But this makes no sense: a superposition of quantum states is itself another (distinct) quantum state, and as you are well aware, we can change the basis we are using to write any given state as a superposition of other states.
The key point is that your distinction between states and superpositions of states has the completely untenable implication that what is ontologically real depends on our choice of how to describe it  unless, I suppose, your theory adds some kind of preferred basis to the quantum state. But you have not indicated anything like this, and it would go beyond "taking the math literally".
 Amplitude as Representing Uncertainty 
You also say, "When we convert from amplitudes to probabilities, we lose, or rather combine, information that distinguishes between different states. For example, we picture possible states of a system as being the different points along the circumference of a unit circle." You seem to be saying here that quantum states have something like a complex phase, and that this is why our uncertainty about them must be parameterized by amplitudes rather than probabilities. But it is still entirely unclear to me what this means, given that the amplitude is the coefficient of the state in a superposition, and a superposition of states is itself a distinct state. I don't see how you distinguish between what is ontological and what is epistemic in your view, and without a clear distinction there I simply don't understand what your view is saying about reality or our knowledge of it. What makes the amplitude a ***representation of uncertainty*** about quantum states, rather than ***part of the quantum state itself***? What makes it epistemic rather than ontic?
Also, if there are in fact states  "physical" or "ontological" ones, as opposed to the (primarily?) epistemic superpositions of them  even if these states have something like a complex phase, it is unclear to me why we can't say "there is a certain probability for the system to have this state, and a certain probability for it to have that one..." and so on. Or, for that matter, since wavefunctions are a way of representing the quantum state, why we can't say "there is a certain probability for the system to have this wavefunction and a certain probability for it to have that one". Crucially, if there are these states, then when we try doing probability calculations without accounting for possible differences in them, then of course we might end up with the wrong answers. But this doesn't prove that ***probability*** is wrong  it just means we are incorrectly neglecting relevant information.
This goes back to my original comment on your post "Why is quantum physics so weird?" Your argument there, using the double slit experiment, was (basically) that we have to represent uncertainty as an amplitude rather than a probability, because we get the wrong answers otherwise. My objection was that this ignores relevant information  the quantum state is different if you close one of the slits or if you add a whichpath detector, compared to the usual case where both slits are open and you get an interference pattern. If that quantum state is ontic rather than epistemic, the probability calculation is wrong because it plugs in the wrong numbers, not because probability itself is invalid.
So overall I think your argument that we must use amplitudes instead of probabilities is circular: it only makes sense if you already assume that the wavefunction is epistemic. Your response in your recent post where you calculate the quantum state when a whichpath detector is included actually shows that my objection is right (as far as I can tell): the wavefunction is indeed different. In that case, instead, you reject the possibility that the wavefunction is ontic because of the implied nonlocality. I see that rejection as misguided, because the nonlocality isn't really avoidable anyways.
By the way, I haven't actually seen the way you write amplitudes as analogous to probabilities (where you have amplitudes for events, or unions or intersections of events, or conditional on other events) used anywhere else. How exactly are these defined? Are there any resources that teach this way that I could look up to understand your position better?
 Does Bell's Theorem Go Wrong? 
You make a distinction between two different approaches: one that calculates probabilities from amplitudes and then averages to get the probability for an ensemble, and one that averages to get an amplitude for an ensemble, and then calculates a probabiltity from it. Your claim is that the second method is what is actually used in quantum physics, while Bell's theorem assumes the first. Hence, "Bell's formulation does not match what is done in the actual quantum mechanical calculations."
Forgive me if I am rather skeptical on this point, for two reasons. Bell himself was a quantum physicist; I can only assume he was well acquainted with how things were done. So how did he get this wrong? Secondly, I have never seen this response to Bell's theorem anywhere else; how are all the physicists working in quantum foundations missing it? Admittedly this response might be out there and I have never seen it; I haven't surveyed the literature extensively. But the main responses to Bell's theorem seem to be nonrealism, manyworlds, retrocausality, and superdeterminism. If things as far out as retrocausality and superdeterminism are being explored, it seems like someone else should have hit upon this as well.
I'm not sure if this is a valid counterexample to your claim, but consider this possible experiment: a doubleslit experiment with electrons sent through one at a time, but in any single run of the experiment one or the other of the slits is closed. Wouldn't this destroy the interference pattern? Yet if you just add up the amplitudes for each member of the ensemble, and then calculate the probability, you would predict an interference pattern, which would be wrong.
Or consider another possible experiment: a device prepares a confined electron ("particleinabox" style setup) in one of several specified energy states, but it randomly chooses which one in each run of the experiment, with probability specified by its programming and method of random number generation. Then we measure the electron's position. The device records which energy state it prepared the electron in, and we can choose to postselect the data on that information or not. Isn't this a case where it is perfectly valid to say "there is suchandsuch a probability for the electroninabox system to have this wavefunction"? And again, it seems you would not get the correct results by adding up the amplitudes first (I don't think that would even be welldefined, since the amplitudes of the different energy states rotate at different rates, whereas the probability distributions for the individual energy states is independent of time).
All of this makes sense to me if the quantum state/wavefunction is (or represents) something ontologically real, and we can talk about probabilities for those ontologically real states (and probabilities for future states implied by them). But none of it makes sense to me when you say that we can't use probabilities to represent our uncertainty, and must use amplitudes instead. I just don't know what that means.
Thanks for your time; best regards!
Definite quantum states
"However, unlike the Copenhagen interpretation, where the particle remains as some ghostly nonphysical wavefunction between measurements, I would maintain that it is in some definite state, we just don't know what that is."
After reflection, I have to agree with Matthew here: postulating that particles always have some definite state  meaning a pure state in some basis  really adds nothing to your interpretation.
Take the case of particle decay, and suppose the detectors are set to measure spin in the same direction A. Unless the fermions' hidden spins happen to point along A, measuring one particle's spin changes its spin direction to A (either up or down), which means the other particle must change to have the opposite spin (or else angular momentum isn't conserved.) That gives no advantage over the Copenhagen interpretation, where measuring one particle puts both into some definite state. Both ways, you still have both particles changing at the moment one is measured, and the same issues with defining "the same moment" under special relativity arise. The sole difference is that Copenhagen allows the zero state (which has no clear direction) to be an actual particle's state, which you would disallow.
The reasoning of your interpretation (the particles always have opposite spins if measured in the same direction because angular momentum is conserved because the action of the true physics is unchanged if the coordinate axes are rotated because God doesn't prefer one direction over another without special reason) still holds up without assigning a definite state to particles between measurements. So what's accomplished by doing so?
The other big problem with Copenhagen interpretations, the need to distinguish macroscopic objects from quantum particles, is I think resolved by hylemorphism; any substance whose matter is quantum particles defines a basis of states for particles it interacts with, by virtue of its form. If resolving that is the reason to assume that particles always have a pure state in some bases, then I don't think it's needed.
Thanks
Thanks to Matthew and Michael for your comments, both here and on other posts. I'm tied up with other things this week, but I'll respond as soon as I can.
Comments on Bell's Theorem
1. "I would maintain that it is in some definite state". Then the future is fixed and you should be a super determinist. Copenhagen says that the future state of the universe does not physically exist until it is encountered. That is, the future is being flexibly created (but connected to the past via entanglement) as we move through time.
2. "then there must be instantaneous communication between the two experiments". There is no instantaneous communication. The communication happened at the point of entanglement. Entanglement doesn't fit our intuition.
As I mentioned elsewhere this week, "The philosophy of quantum mechanics is primarily concerned with making Nature fit our intuitions, instead of making our intuitions fit Nature."
On Bell's theorem
Then the future is fixed and you should be a super determinist. No. Indeterminism comes in with events. Particle A can decays into B, C, or D. We can't predict, based on our knowledge of particle A (and the state of the rest of the universe at that time or earlier) which of these states it will decay into. But each of B, C and D are a definite state. We don't know which state that is without experiment, or often even after experiment. But it still remains a definite state. That, to my mind, is the most natural interpretation of the mathematics.
The communication happened at the point of entanglement. I would very much like to agree with you. But the problem is that we have to confront Bell's theorem and the related theories. What you are advocating is essentially a hidden variables theorem. The particle must in some sense carry the results of every possible measurement from the decay to the measurement. But how are those results contained within it? If it is in the form of a quantum amplitude (as I maintain), then you have indeterminate orthogonal states, and some measurements will be undetermined until the event actually happens. If they are conveyed as exact results (i.e. if you make this measurement, you will get this result, if you make that measurement you will get that result, and so on for every possible measurement), then you are advocating precisely the sort of hidden variables theorem that Bell disproved. If you have any other suggestions, then I would love to hear them.
"The philosophy of quantum mechanics is primarily concerned with making Nature fit our intuitions, instead of making our intuitions fit Nature." Not entirely. The philosophy of quantum mechanics is difficult, because at some point everyone has to draw up at least one premise which is unintuitive. But everyone who does it does so with the intention of describing nature. Different people will pick a different unintuitive premise, and that choice will no doubt be partly based on their prior prejudices. But to be judged seriously, they need to find a bridge between the philosophy and the physics, and make predictions, as precise as they can make it, based on their philosophy on what the physics ought to be. That can then be tested. At the end of the day, if there is more than one philosophy which makes identical correct predictions for the theory, then we will have to have some other means to judge between them. But the work is not based on prejudice; everyone who does it tries to be guided by nature.
Amplitudes v Probabilities
Matthew,
Thanks once again for your comments. Now my schedule is a bit less hectic, I'll try to respond to them gradually as I get the chance.
I'll begin with your discussion of amplitudes as a measure of uncertainty, since that strikes me as being at the heart of our disagreement.
I should say at the outset that my notation is my own, but the calculations I use that notation for are standard. It is just a different way of expressing standard physics. I'll get back to that in a moment. But first of all, I need to discuss what I mean by an amplitude.
You stated in your response
But it is still entirely unclear to me what this means, given that the amplitude is the coefficient of the state in a superposition, and a superposition of states is itself a distinct state.
But that is not entirely what I mean. By an amplitude, I mean the overlap between two distinct states. The coefficient in a superposition could be treated as an overlap between two states  if we have a basis state φ and a wavefunction ψ, then the amplitude would be <φψ>. Your point that we can rotate the basis of ψ is perfectly true from one point of view, but I am discussing things from a different perspective. To me, φ represents a physical state rather than one state in an arbitrary basis. Perhaps φrepresents one possible outcome of a measurement. You can rotate the basis so that ψ is no longer in superposition. But, for consistency, you must express everything in the same basis. That means if you rotate one wavefunction, ψ, which represents something physical (namely the state of the particle or our knowledge of the state of the particle depending on how you interpret it), you must also simultaneously rotate the basis of every other physical state in the system. This includes φ (which is physical, since it represents a possible measurement value). Thus after rotating the basis, the quantity of interest, <φψ> remains unchanged. Mathematically, the rotation of ψ is given by
ψ> > U ψ>,
where U is some unitary operator, so
<φψ> > <φU^dagger U ψ> = <φψ>
When I discuss an amplitude (in the context of this discussion), I mean something along the lines of
<fSi>, where i represents an initial state, f the final state, and S some evolution operator that one possible path from the initial to the final time. (If the initial and final state are at the same time, or the state doesn't change over time, S can be the identity operator.) Suppose that there is only one path from the initial to the final state. <fSi>^2 is then a probability, and we both agree that a probability can be interpreted as a parametrisation of uncertainty. So why can't we also say that the amplitude <fSi> can be interpreted as a parametrisation of uncertainty? How does taking the mod square of a number change its interpretative status? When we have more than one path, we agree that Σ_{S} <fSi>^2 can be interpreted as a parametrisation of uncertainty. So why not Σ_{S} <fSi>? And if adding together a bunch of numbers gives us a parametrisation of uncertainty, then surely each individual number will fulfil the same role, especially since there is the natural interpretation that it represents the uncertainty of reaching the final state from the initial state via that particular path.
So onto my notation. I will discuss equation (13), which involves the intersection operator (∩). ψ(A ∩ BQ) represents the amplitude that you have both measurement outcomes A and B given initial conditions Q (albeit that I expanded Q to indicate both the initial conditions and the assumptions behind the model). This would normally be expressed as <AQ> <BQ> if A and B are treated as independent (i.e. different measurements, or different paths etc.). I introduce a set of intermediate states Y, which together form a complete basis. Thus ∫_{Y} Y><Y represents the identity operator. So I have written
ψ(A ∩ BQ) = <A ∩ BQ> = <A ∩ B∫_{Y} Y><YQ> = ∫_{Y} <A ∩ BY><YQ>.
Obviously equation (13) is a bit more complex than this  these are fermions so one has to watch out for possible minus signs  but it is the standard formula just written in a different notation, one I choose to draw out the analogy with the earlier probabilistic calculation.
The union operator (∪) represents the or statement, so ψ(A ∪ BQ) represents either final state A or final state B given initial conditions Q. The basic rules of quantum mechanics (again, with certain assumptions about independence) state that if we want to know the amplitude for either A or B given initial conditions Q, then we add the two amplitudes together. Thus
ψ(A ∪ BQ) = <A ∪ BQ> = <A Q> + <BQ> = ψ(AQ) + ψ(BQ).
True, I made the mistake of inventing my own notation, and not explaining it. Thanks for calling me out on that. Hopefully this explanation will help. But my calculation itself is standard. It is just expressed in a slightly idiosyncratic way. (Nothing wrong with that: it is often worth looking at the same calculation in a different way, because you can gain other insights from it.)
Anyway, let me know if you found the above explanation about what I mean by an amplitude helpful. If not, then we still have some discussion to do.
Background reading.
Hello Dr Cundy,
Thank you for your efforts in synthesizing AristotelianThomistic Philosophy with Quantum Physics.
I wanted to ask if you could name a few summary books for a layman to develop enough of an idea to grasp and communicate:
1) A grasp of the pertinent principles of Quantum Field Theory/Mechanics
2) A modern summary of Aristotelian/Thomistic Philosophy/Metaphysics.
I want to purchase your book but I'm afraid I don't have the depth of knowledge to engage with it meaningfully let alone communicate the concepts to my peers; and would like to develop my understanding first.
Book recommendations
For 2), I would recommend the works of Edward Feser, which are (I think) reasonably accessible. Start with his Aquinas, and then move onto his Scholastic Metaphysics. You can also dive into his excellent blog. Peter Kreeft has also written numerous works at an accessible level: perhaps his Shorter Summa world be a place to start.
For 1), it is a bit harder because even the layman books are perhaps still a bit hard to follow (and also I learnt the subject mainly from the academic textbooks so don't have that many popular works on my bookshelf). I have always had a soft spot for Richard Feynman's works; in particular his QED is aimed at the layman. Frank Close "The infinity puzzle" is also highly recommended.
For academic textbooks (if you can cope with some mathematics) with an approach to quantum physics similar to mine, I would recommend Binney and Skinner's work for an intermediate level (Townsend's book as a less modern alternative), and then Peskin and Schroeder for the advanced stuff. There are several decent more conventional starting textbooks; Griffiths' book is frequently recommended.
Amplitudes and probabilities
Okay, so your clarifying remarks make it seem to me that when you are thinking of an amplitude as a representation of uncertainty, you have in mind not just any basis, but a basis made up of what you call "physical states". The divide between what is ontological and what is epistemic in your view is determined by what these "physical states" are  is that a fair assessment?
If that is so, I think there is a serious problem with choosing a possible measurement outcome to be an example of one of these physical states, namely, that measurement is a highly anthropocentric concept. This is why the usual interpretation of QM comes across as instrumentalist to me: a theory for calculating probabilities of measurement results, not a theory that tells us the fundamental nature of reality (about which it really says nothing). If outcomes of measurements are among your fundamental "physical states," that seems to be tantamount to saying that there is no microscopic reality, or that observation creates reality, or such as that.
The converse is that if you want to say that your physical states are really a fundamental representation of reality, then you need a "privileged basis" which picks out the set of states one of which is physically real all the time (and about which the quantum state is a representation of our incomplete knowledge, in your view). More generally, you would need some kind of rule that picks out the set of physical states  maybe the rule depends on the Hamiltonian or on which state is presently actualized, but it really should not depend on things like measurement or observation. (I mean, maybe you could justify the dependence of the physical basis on things like measurement or observation by reference to the topdown features of Aristotelian metaphysics (e.g. form), but doing so really seems to make the status of microscopic analyses of things somewhat questionable. That's an area of Aristotelian approches to the philosophy of science that I don't understand well.)
Now if you endorse some kind of privileged basis as the set of states that can be actualized, one of these states as actual, and the quantum state as epistemic when expressed as a superposition of the basis states, then  well, I unfortunately I think I still don't know how to conceptualize your view. I was going to say I could  sort of  get it, but then I thought about it some more, and realized I was still conceptualizing the quantum state as more like an objective representation of the potentialities of the system, rather than a representation of our knowledge.
You (basically) asked me why a complex number can't be a parameterization of uncertainty if we both agree that the modsquared of a complex number can be. I suppose my answer is that modssquared are real (1D) and complex numbers are complex (2D). I know what it means to be more or less certain about something; I don't know what it means to be certainintheimaginarydirection. I don't know how to interpret the phase information in the context of uncertainty.
(This interpretational problem doesn't arise for me if the quantum state is ontic rather than epistemic. In that case, we can just say that the objective state of the system has this phase information, or something that the phase information represents. I can sketch out, roughly, what that means on the quantumtheorywithoutobservers type interpretations that I am familiar with  various takes on pilotwave theory, objective collapse, or manyworlds. But if the quantum state is epistemic, the phase is supposed to be representing something about our knowledge, and it raises the question of precisely what it is about our knowledge that it captures.)
Amplitudes and Probabilities (again)
Thanks for getting back to me, Matthew; and for this conversation. It helps me to think about how best to explain things and clarify things  it is a difficult topic, and difficult to express oneself clearly.
Okay, so your clarifying remarks make it seem to me that when you are thinking of an amplitude as a representation of uncertainty, you have in mind not just any basis, but a basis made up of what you call "physical states". The divide between what is ontological and what is epistemic in your view is determined by what these "physical states" are  is that a fair assessment?
Not quite. As a physicist, I am going to discuss things in terms of measurement. Ultimately the goal of theoretical quantum physics is to predict experimental results, so that is how we are trained to think and express ourselves. So my apologies if I discuss measurement and experiment when I should be more general (even if I do so). When I say measurement, you should not read it as necessarily meaning the outcome of an experiment; but any circumstance where decoherence occurs and the system is forced into one state or another of a particular basis, whether or not there is a scientist at the end of it, setting the system up and reading off the results. There is some physical process that happens inside an experimental detector. The same physical process will happen in other circumstances. I discuss measurement and experiment because that's the context in which I am used to thinking about things.
Also, anyone who takes a wholly or partially psiepistemic view of quantum physics is going to value experiment. Measurement is the point where our knowledge and reality coincide.
I don't believe that observation creates reality. There is reality between experiments. But we don't know what that reality is unless we perform an observation. But the process of observation  and it is not the only thing that does this (for example the same effect would occur if a particle went into a detector but nobody was looking and the result wasn't recorded)  involves forcing the particle to change into one or another state in a particular basis, a basis that is determined by the set up of the experimental detector. Observation doesn't create reality, but it does change reality when we interact with the particle we observe, and decoherence does its thing.
But when I talked about a physical basis, I had something else in mind. When we first start to construct a theoretical system, the basis we use is entirely arbitrary. There are many different formalisms which could each equally well model reality. We can choose whatever we like. We can also in principle change it midway through the calculation. For example, we can use any hermitian traceless two dimensional matrix to represent the operator whose eigenstates correspond to definite spin states along the x axis.
But our goal is not just to come up with some nice theoretical framework. Our goal is to understand things about reality. Thus if we use one particular operator to express the spin along the X axis at the start of the calculation we need to be consistent in our usage throughout the calculation. As soon as we start applying the calculation to the real world, the formalism becomes fixed. We no longer have the freedom to change it. Now, being consistent doesn't necessarily mean that we stick to the same basis throughout the calculation. There is often value in rotating the basis to make the next part of the calculation easier (for example, to rotate to an energymomentum basis when we want to consider evolution in time). But when we do so, we have to be consistent, and rotate every operator and state in the abstract formalism in a similar way. My point was that if we had a wavefunction ψ>, and wanted to take one particular component of it with reference to some basis vector φ, so
<φψ> = c, we can't just arbitrarily change the value of c by rotating the basis of ψ> while keeping φ> constant. We have started the calculation; we have already established a mapping between all our symbols and something they represent in the real world. Everything in the theoretical framework has something which exists or could exist in the real world corresponding to it. That something might be a physical particle, or a basis representing the possible outcomes of an experiment, or the possible outcomes of a possible experiment (which nobody performs), or the outcomes of when a quantum system happens to interact with something which forces it to decohere and collapse into a particular basis but isn't an experiment. We can write down a symbol to represent each basis, but my point was we have to be consistent in how we do it. Once we have the mapping between abstract states and reality, we have established our formalism; we cannot just arbitrarily change to a different formalism. We are still free to change the basis in the middle of our calculation, but we have to do so in a consistent way. We have to apply the same rotation to all the symbols in the calculation. Thus c will remain constant after such a rotation.
How can we consider something two dimensional as a parametrisation of uncertainty?
Firstly, as I stated in my original post which sparked our discussion, quantum physics is not intuitive. We can visualise Aristotle's physics. We can visualise Newtonian physics. With Einstein's physics (by which I mean relativity) it gets harder, but it still roughly follows our intuitions. Of course, we do better in each case to represent our images mathematically, but we can still back up that mathematics with a mental picture. With quantum physics, that is impossible. We can understand the mathematics, but we are not capable of formulating an accurate mental picture. It is not possible to understand quantum physics in terms we are used to, and anyone who thinks they have understood it, hasn't (as the saying goes). So in any philosophy of quantum physics there is going to be something unintuitive; something we can't visualise, which defies our common sense. In my own interpretation, this is that point.
But let me try to offer a picture, anyway. I agree the amplitude is two dimensional, and the probability one dimensional. You would have to agree that the radius of the amplitude can be used to parametrise uncertainty, since it is in a onetoone relationship with the probability. The point where I disagree is the assumption that statements of uncertainty are necessarily one dimensional. If we have a single proposition, then I agree that a line segment would be appropriate to parametrise our uncertainty. The proposition can be either true or false, and our belief about it would lie somewhere between these two values.
But what if, instead of a single proposition, we have a set of related propositions? Maybe they depend on some parameter X, a continuous variable. Our understanding of whether each proposition is true will depend on X. So we might want to represent all of this together. So the parametrisation of our uncertainty would now be represented by a rectangle, with the vertical axis representing our certainty, and the horizontal axis X. Rather than a point on a line segment, our uncertainty of the whole system would be represented by a line on a plane.
Now suppose that X has some circular symmetry, so X=0 is equivalent to
X = 2 π. In this case, we would do well to parametrise our uncertainty by a circle. The degree of uncertainty is mapped to the radius of the circle, and the parameter the phase. Our uncertainty for the whole system would be represented by a line around that circle.
Thus far we have a picture which is consistent with classical physics, and which we can imagine. Our uncertainty is parametrised on a two dimensional surface, albeit by a line rather than the point we have in quantum physics.
Now suppose that X, rather than just some random parameter, is also something that takes on a physical value. We know that some value of X is the true one. We might not be certain which value of X is correct  it might change in time, but some value is correct. So the original uncertainty  if X = X_{1}, then our degree of confidence in the proposition is C(X_{1}). The circle as a whole represents the entire general picture, it allows us to see our uncertainty for every value of X simultaneously. However, a single point on the circle allows us to represent two statements simultaneously: X = X_{1} and our degree of certainty in the proposition is C(X_{1}). Of course, if we are also uncertain about X, then we might represent our knowledge of the situation as a small line segment, or add an extra dimension to display that uncertainty.
We need to keep the whole circle during the calculation because we don't know what the value of X is until the last moment.
So the radius of the circle represents our uncertainty; the phase of the circle represents an internal parameter of the system (in the case of quantum electrodynamics, the gauge). This is now where quantum weirdness comes in. The X are not independent parameters, as they would be in the classical picture. Instead, we have the superposition principle. Thus when we combine two different amplitudes (as in the or case above), we not only change the radius, but also adjust the phase of the particle.
In this sort of problem, we are chiefly interested in predicting the future. Let me offer a simple example.
Suppose that we denote our initial state as φ>, and we work in a basis of orthogonal states φ_{i}> (which corresponds to some observable; an eigenstate of the operator representing the outcomes of some experimental detector). We evolve the system for a certain amount of time, and then take another identical measurement. If φ_{1}> is an energy eigenstate, then it evolves according to e^{iE_{1}t/ℏ}φ_{1}>. If the original amplitude at time 0 is c_{1} = <φφ_{1}>, then the amplitude at time t will be
c_{1}e^{iE_{1}t/ℏ}. So the phase (or X) changes in time, with its rate of change depending on the energy state. The probability is always c_{1}^{2}.
On the other hand, if the initial state is a superposition of two energy states ( cos θ E_{1}> + sinθ E_{2}>), then the amplitude would be something like
c_{1}( cos^2 θ e^{iE_{1}t/ℏ} + sin^2 θ e^{iE_{2}t/ℏ}). The probability changes in time. (c_{1}^{2 (cos4θ + sin4θ + 2sin2θcos2θ cos (E1  E2)t/ℏ)}) The system no longer remains in an eigenstate of the original basis. (Again, we have already established the mapping between every symbol and something in reality, whether a particle, or a set of possible experimental outcomes, or some other decohered system if you don't want to privilege experiment. We choose all our basis vectors such that in the initial state there was no phase. We have to be consistent, and either stick to the same basis throughout the calculation, or rotate them all in the same way.) Thus when we repeat the experiment with the same or an identical detector and the same particle but at a later time, we could get a different result.
The phase is the crucial thing here. It is an essential part of the description of the quantum state, but it is not determined by the underlying basis. Energy eigenstates change in time by a gradual rotation of the phase. Only relative phases can be measured, not an absolute phase (we can derive the zero point to be whatever we choose). But they are nonetheless key in understanding how states change in time. So when we specify a result related to a quantum state, we need to express the basis, the degree of certainty that it is in that state and that phase.
So when we come to express our degree of certainty, we can't just say that "my degree of certainty that the system is in state φ> is c_{1}" because just listing φ> is not sufficient to parametrise the physical state. We need to convey three pieces of information: the state, the phase, and the degree of certainty. A probability is a mapping between two pieces of data: a classical state and a degree of certainty. We could only use probabilities to express our uncertainty if we treated each quantum state and quantum phase together as a classical state, and then assigned a degree of certainty to each one of these pairs of data. But this would not give the correct result if we tried to combine the probabilities of two different states. The amplitude, on the other hand, maps two variables to the state, the phase and the degree of certainty. When we combine amplitudes we get the right result.
When we think about a degree of uncertainty, we need to know three things: 1) The set of possible states (which in some way will reflect the physical system of interest, and will differ from one practical usage to another); 2) our mathematical formulation that expresses the degree of certainty and whatever other information we need to know to process the system (which is an abstraction; expressed as a set of numbers; and will be a generalisation applicable to numerous different usages), and 3) how to combine degrees of certainty across different states. Without 3), our parametrisation is useless if we want to do any theoretical work. In the mathematical formulation of probability theory, 3) depends only on the information stored in 2); that's a built in assumption of the mathematical framework of probability (Kolmogorov's third axiom). In classical probability theory, everything we need to know about the system in order to perform calculations is contained within the data in 2) which represents our degree of uncertainty. And that is a useful assumption to have. We are trying to create a general framework for discussing uncertainty; we don't want to have a different set of mathematical rules we need to apply to each individual physical system, but a method we can adopt generally. But in quantum physics the probability (or its square root) is not sufficient to perform calculations, combine different degrees of certainty and so on. We need to know the phase to know how to combine degrees of certainty for two different events. 2) must therefore contain both information about both the degree of certainty itself and the phase. This allows 3) (and any subsequent calculations we make) just depend on our base abstraction, and allows it to be generalised. The amplitude conveys all the information we need to do calculations in quantum physics, and does it well. In a classical system, the phase of the state would be contained in 1) alone. In a quantum system, it needs to be included in both 1) and 2), otherwise we wouldn't get the correct results.
So the amplitude in quantum mechanics plays the same role as the probability in classical probability theory. 1) it provides us with a numerical parametrisation of the degree of uncertainty for irreducible states (which just needs the radius) or results from individual paths; 2) it provides us with the information we need to combine results from irreducible states to get a degree of certainty for a set of states (which needs both the radius and the phase); 3) it allows us to extract a numerical representation of the degree of certainty for these sets (both the radius and the phase). In classical probability, the single real number provides all we need for all of these. In quantum physics, we need both numbers. The probability isn't just a parametrisation of the degree of certainty of irreducible systems; it also provides us with all the information we need to perform calculations and deduce degrees of certainty for more complex systems according to a set of well defined rules. For the quantum amplitude, the radius alone is sufficient for the first role, but we need both the radius and the phase to fulfil the second role of the probability in providing us with the information we need to express degrees of certainty for more complex systems.
Why should we believe that Kolmogorov's axioms (or Cox's equivalent axioms) are the only way to parametrise uncertainty? I know of no proof that there is something special about those axioms; and that we cannot choose others to do the same job and build up a mathematical framework on top of them. Kolmogorov's system would hold no theoretical advantages over that other system, aside from the tie in with frequency.
To go back to my example, we have an initial state, and various states it could be in. Our degree of certainty that it is in a particular state can be expressed as c_{1}, which is just a real number which (at this point) we have the freedom to define to be positive. Zero represents falsity; 1 represents truth, albeit different expressions of truth. c_{1} isn't a probability  it doesn't obey Kolmogorov's third axiom  but it is in a onetoone relationship with the probability. But when we want to investigate how this degree of certainty evolves in time, we have to invoke the principle of superposition. There is a second question we could ask: which energy state is it in. Let's say that only the states φ_{1}> and φ_{2}> both contain the energy eigenstate E_{1}>. The degree of certainty that we will measure it to be in state E_{1}> would then be c_{1} cos θ  c_{2} sinθ. Again, there is no phase information  this is just a single real number  but again this is not a probability, in this case additionally because it could be negative. c_{1} represents a degree of certainty. c_{2} represents a degree of certainty. But adding them together doesn't?
Then, of course, when we evolve them in time, we introduce additional phase information. But the combined numbers have the same interpretation as at time equal to zero: they tell us how likely we are to find the state in a particular state. What other interpretation can we give to them, in a psiepistemic view of quantum physics?
So what part of our knowledge does the phase capture? Ultimately, it is related to the physical gauge of the particle. In classical probability, we abstract out all information about the state of the particle from the number we use to represent the probability. We don't need to know anything about what the state represents. We can still do probability calculations. In quantum physics, we need to bring one additional piece of physical information into our generalised representation of uncertainty. It is not redundant. The radius gives us the degree of uncertainty (as you think of it); but the radius alone is not sufficient because it does not tell us how to combine different amplitudes. We need an extra bit of information to do that, the phase. And the phase and radius together are required if we are to perform any calculations in a quantum mechanical system, whether combining two amplitudes to get a total result in a system with interference effects, or looking at different observables.
Physical States and the Quantum State
I think it would help in explaining your viewpoint to go into a bit more detail on the divide between what is ontological and what is epistemic. You write,
"I don't believe that observation creates reality. There is reality between experiments. But we don't know what that reality is unless we perform an observation."
So I get that it is part of your view that we can't know what is happening between observations, and there is the further complication that "observation" is actually an interaction that can change the state of the system being observed. But you do think that the system is in some "physical state" in between observations, and that the wavefunction or quantum state characterizes our knowledge about the different possible states it could be in. So there is some set of possible "physical states," which are ontological, and superpositions of those physical states (different possible quantum states that we could write) are epistemic. Is that right? If not, I don't know what you mean when you say that there is reality between experiments, but it isn't just the quantum state (since that is epistemic and only represents our knowledge).
So what are those physical states? In response to my comment about the humancenteredness of the measurement concept, you write,
"When I say measurement, you should not read it as necessarily meaning the outcome of an experiment; but any circumstance where decoherence occurs and the system is forced into one state or another of a particular basis."
And,
"The process of observation involves forcing the particle to change into one or another state in a particular basis, a basis that is determined by the set up of the experimental detector."
So you seem to be using the process of decoherence in some way to distinguish the possible (ontological) "physical states" from the (generally epistemic) quantum state. Now, decoherence is something that happens to the quantum state, which is epistemic. So it can't be the process of decoherence itself that picks out the physical states; rather, it would have to be something like the structure of the Hamiltonian of the system that induces the possiblity of decoherence.
Now, I presume you want to say that experimental detectors and other such things are themselves made up of quantum objects. (It's possible to go a different route, and use the topdown features of Aristotelian metaphysics to say that it is macroscopic objects that exist fundamentally, while the particles that make them up exist only virtually. But I don't know how to make such a concept scientifically precise.) Since these detectors are made up of quantum objects, in principle our analysis ultimately is going to arrive at the quantum fields or particles that make up both our system of interest and the detectors we are using to interact with it. And that means that the Hamiltonian that determines the possibilities of decoherence (and thus picks out the "physical states" as opposed to the epistemic quantum state) is just the Hamiltonian of the Standard Model (or whatever theory ends up superceding it).
Correct me if I'm wrong, but because the Hamiltonian of the Standard Model is local (I would prefer to phrase it as "it is the quantization of the Hamiltonian of a local classical field theory," but that is another discussion  please see my latest comment on the other post), roughly speaking, the basis that decoherence occurs in is the position basis. Bohmian mechanics, for example, relies on this fact and notes that all measurements ultimately come down to observations of the position of something (e.g. position of a pointer, or spot on a photosensitive screen) to show how its actual particle positions, guided by the wavefunction, solve the measurement problem.
So it seems to me that your view should follow this line of reasoning:
There is some actual physical state. (I take this to be implied by what you've written.)
The set of possible physical states is determined by the possibilities resulting from decoherence. (Ditto, along with the fact that it can't be the actual occurrence of decoherence, because that is something that happens to the epistemic quantum state.)
The possibilities resulting from decoherence are determined by the Hamiltonian.
The revelant Hamiltonian (coming from the Standard Model or whatever supercedes it) produces decoherence in the position basis.
Therefore, the set of possible physical states are those states forming the position basis.
So when you say that there is some objective physical state and the wavefunction characterizes our knowledge about what it is, to be more precise, it seems you should say that (more or less) there is some actual number of particles with definite locations, and the wavefunction/quantum state characterizes our knowledge about the number and locations of those particles. (Incidentally, this is a decent argument that we should take the ontology underlying QFT to be one of particles rather than fields, though I think it may be possible, again roughly speaking, to replace "particles" with "localized field states".) Maybe you would posit that the actual state includes some kind of quantum phase as well.
What are your thoughts on that? I'll come back to continue our conversation about amplitude and uncertainty at a later time...
On Physical states
The first thing to remember is that we don't have access to the physical states. All we have access to is the abstract representation we use to represent them. To be successful this representation will need to be in a onetoone relationship with the physical states. [Admittedly, I have been sloppy in consistently using this language here.]
So what is the representation of the physical states? The space on which they live is the same as we use in quantum physics. So, for example, for an electron, that space would be represented by a two dimensional complex spinor on each location in spacetime. For a quark, you would additionally need three indices to represent the colour. [Plus a similar number of degrees of freedom for the antiparticles.] Any self consistent set of orthogonal eigenvectors mapping this space will suffice as a basis. Any one of these eigenstates could in principle represent a physical state. The two most useful bases are, of course, that formed by the eigenstates of the Hamiltonian operator and that the location basis. [Of course, these only constrain some of the degrees of freedom: spin operators generally commute with the Hamiltonian operator or the location operator, so saying that we have chosen the location basis isn't sufficient to constrain the system; we also need to specify the spin basis.] And, of course, we immediately have superpositions: if the particle is represented by an eigenstate of the location basis, then it will be in a superposition of basis states in the Hamiltonian basis. All we know is that the physical state is represented by some state in some basis. We just don't know which one, outside of measurements. When we want to make predictions, as in my calculation in the main post, we compensate for our lack of knowledge by integrating over all the possibilities.
This is what I mean when I say that the wavefunction is part epistemic and part ontic: the same mathematical framework is used to describe both of them.
You touch onto the next issue, which is how do more complex objects fit into this system. The way I look at this is through the lens of effective field theory. To use the example with which I am most familiar, a proton is a combination of quarks and gluons. If we want to study the proton, we need to understand its energy band structure, which means we need a Hamiltonian operator that represents the interactions between bound quark states. This Hamiltonian operator will again be constructed from creation and annihilation operators, but this time they will represent the proton, the neutron, and so on. How do we get from quark and gluon creation and annihilation operators to proton and neutron operators? Through another rotation of the basis, only this time the rotation will merge together the original quark, gluon, photon, electron and so on operators. We will get an entirely new basis of operators. It is not just the sum of the parts; while we say the proton is composed of three quarks, this is a little misleading. The transformation is not a straight forward linear combination. (In practice, what people do is use symmetry principles to constrain the various terms which could contribute to the effective Hamiltonian, and then compare with experiment or first principles calculations to determine the coefficients of the expansion. The effective Hamiltonian is considerably more complex than that of the elementary particles). One cannot look at the operators for the proton, and point out the individual quarks.
Moving from quark to proton is one level. But the same process can be used to move up to the nucleus as a whole  we have a Hamiltonian that is a further rotation of the basis from the effective Hamiltonian representing protons/neutrons. We can calculate the discrete energy levels of this Hamiltonian. Then we go onto the atom; then to molecules or the entire crystal. In each case (although it gets harder with each step; and the effective Hamiltonian for protons and neutrons is difficult enough) we get a new Hamiltonian, with creation and annihilation operators mapping between different states, and new "fictitious" particles such as phonons which carry energy and momentum around in a similar way to which photons do in the standard model. Again, you can't pick out specific nucleons in the Hamiltonian for a metallic crystal; even though they are in some sense present. This is how topdown causality works in physics; and I think there is a clear analogue with the Aristotelian notion of virtual existence.
So back to your question about the detector. The detector is obviously some large and immensely complex thing, made up of macroscopic parts. Each part will have its own effective Hamiltonian, which presumably sit alongside each other in some way (apologies for the vague language: the details are way beyond my ability to calculate). To simplify things, let's say that the detector is governed by a single effective Hamiltonian. This is not the same as the standard model Hamiltonian. Now into this comes our poor little free electron. The electron is not described by the same basis as the effective Hamiltonian describing the detector. The electron is described by a single particle Hamiltonian; the eigenstates of the effective Hamiltonian mix together all the fundamental particles. To describe interactions between the electron and the detector, when we do the calculation, we would have to split the electron up into a superposition of states in the same basis as described by the detector's effective Hamiltonian. [In practice, what we get is a new effective Hamiltonian for the detector + electron system.] For example, if the detector is used to measure spin along a particular axis, then some of these eigenstates will be spin up along that axis, and others spin down along that axis. Unlike the free Hamiltonian, which commutes with every spin operator, the effective Hamiltonian of the detector only commutes with the spin operator along one axis (this is the property that allows it to be a detector of spin). Spin states in the other directions are forbidden. The electron is forced into either a spin up or spin down eigenstate. Which one is chosen in practice us indeterminate, and the likelihood for each one will depend on the coefficients of the superposition. But it is always in a possible physical state, so interactions between the electron and the macroscopic system will force it into an eigenstate of the appropriate spin operator. This interaction between a single particle and macroscopic system is, of course, just what people refer to as decoherence; only I'm looking at it from a slightly unusual perspective (I would say a more fundamental perspective).
I think this is broadly consistent with your five steps, but maybe differing in some of the details. The main quibble I have with your steps is that "the position basis" is not sufficient to describe the particle. One needs to also specify the spin (and perhaps colour) state as well (plus whether it is a particle or antiparticle). Secondly, I also disagree that the Hamiltonian is just the standard model Hamiltonian. The standard model Hamiltonian describes the energy states of free particles (energy states being those which are metastable in time). When we come to compound objects, we need to use an effective Hamiltonian. This is a different basis (mixing together the quarks and gluons), with a different set of (meta)stable eigenstates. But although it can be derived from the standard model Hamiltonian (in principle at least), it is not the same as the standard model Hamiltonian, and has different allowed energy levels and different stable states.
Amplitude vs. probability and entanglement
It does seem that representing both uncertainty and indeterminism with amplitudes (complex values of norm <= 1) instead of probabilities (real values of norm <= 1) resolves some of QM's puzzles, like interference, where we're looking at one event at a time. But I don't think it solves the puzzle of entangled particles, where we deal with two or more events at once.
To see this, let's define Amp(AB), by analogy with P(AB), as the amplitude for event A given that event B occurs, and assume Born's rule that P(AB) = Amp(AB)^{2}. Now a causal theory which requires that Amp(b_{1}B_{3}, b_{2}) = Amp(b_{1}B_{3}) also satisfies Bell's definition of local stochastic causality  just take the squared norm of both sides. Working in amplitudes instead of probabilities doesn't make the theory nonlocal in Bell's sense.
The issue, as I see it, is that the assumptions you are making seem to require that the amplitudes of entangled particles are factorizable, which we know is not the case. In the specific case of two spin1/2 particles generated by the decay of a spin0 boson, for instance, if we assume that the amplitude of measuring a particle's spin as "up" is fixed just by the relative angles of the detector and the hidden spin state of the particle, the equation Amp(b_{1}B_{3}, b_{2}) = Amp(b_{1}B_{3}) holds (the hidden spin is part of B_{3}) and Bell's inequality must hold too. To match QM's predictions (and experimental data) that equation must be violated.
In other words, even if we express hidden state in terms of amplitudes, Bell's theorem still applies.
Amplitude vs. probability and entanglement
Thanks Michael for your comment  it is certainly food for comment. Also, I agree that my interpretation was born out of considering interference effects (and particularly the path integral interpretation), and needs to be applied to entanglement.
A couple of comments on your comment, though:
Born's rule is applied when we have an ensemble of "paths"; the amplitude refers to a single path. Thus the expression should be:
P(AB) =  int dX Amp(ABX)^2
where X represents the various internal variables which we don't know. If there were a single amplitude contributing to the probability, then your expression would be correct. As I stated, the proofs of Bell's theorem that I have seen rely on an integration over the hidden variables after you have calculated the probability, in effect.
P(AB) = int dX  Amp(ABX)^2
So I think you are misusing Born's rule in your comment (although I am not sure that this niggle overly affects your argument). But this is important in the analysis of the effects of hidden variables; in the precise derivation of Bell's inequalities.
Secondly, my interpretation falls neither into the neat psiontic or psiepistemic categories. I recognise that the fundamental states of the particle are best represented by the states used to build up the amplitude (which would be psiontic), but also that our representation of our knowledge of the situation is also represented by the same notation. So if we somehow had a perfect knowledge of the internal configuration of the particle, we would represent that by a wavefunction. But if we have an imperfect knowledge, we can also use the wavefunction notation to express that (and, once we have an ensemble of results, can convert to a probability distribution to compare against an experimentally observed frequency distribution.) In particular, when we predict the future, our use of the wavefunction is more epistemic. When we make predictions about entangled particles, we ought to make predictions for those particles together, rather than individually. We know that if we measure the spin of the first particle along some axis as a, and the spin of the second particle along some other axis as b, then a and b are not independent. Amp(aBb) is used to indicate the amplitude (or our uncertainty) for result a given our knowledge of the background conditions B and that the other particle recorded a result b. In other words, Amp(aBb) is not equal to Amp(aB), but this doesn't imply that anything physical passes between the two particles, because in this case the amplitude is to be interpreted epistemically rather than ontically. It is not a breakdown of locality, in the context that is in special relativity, that some physical particle or signal travels faster than the speed of light.
Thirdly, I do believe that there is a hint of nonlocality present in this experiment. As stated, I distinguish between substance causality and event causality. A substance always emerges from another substance (such as in a particle decay), and this can be mapped out entirely in terms of physical causes. However events are indeterminate in physics (i.e.unpredictable), which means that the cause of events is not something physical, or representable mathematically. (The only alternative to this is to say that there is no cause; which I view as saying that the universe is ultimately irrational). I would attribute this to God (perhaps via some intermediaries). Being outside of time and space, God obviously relates to everything in the universe simultaneously in time and whatever the analogue word for that is in space. Events include those involved in the process where a particle interacts with a detector and we get a result for its spin along a particular axis. Since God controls both events, they are not independent of each other, even though there is no physical interaction between the two. In the absence of miracles, this also leads to the symmetry laws which ultimately lead to the prediction that the two particles will have opposite spin when measured along different axes. The amplitude is used to parametrise our uncertainty about God's action for each event. We know from the symmetry laws that if God causes b, then that has implications for the possible result at a.
Hope that's clear (I'm trying to think more deeply about what my interpretation implies in response to all these comments).
Back to Probabilities and Amplitudes
Hello again Dr. Cundy! Your comment regarding the "physical states" and effective field theory was very interesting, though I must admit at the end I couldn't tell if you were more or less agreeing with me, or adopting a kind of topdown pluralistic ontology (i.e. some objects exist really exist at the composite macrolevel and their microlevel properties and constituents exist virtually and only appear when probed by an appropriate interaction), or something in between. I think an exposition of your view of QM would still benefit from a more detailed specification of what physically exists (fundamental particles? composite particles or objects? fundamental particles + forms of composite particles or objects? an ontological quantum state, different from the epistemic quantum state we assign based on our limited knowledge?), but let's leave that aside.
I'm going to pick up by responding to what you wrote in this paragraph:
"Firstly, as I stated in my original post which sparked our discussion, quantum physics is not intuitive. We can visualise Aristotle's physics. We can visualise Newtonian physics. With Einstein's physics (by which I mean relativity) it gets harder, but it still roughly follows our intuitions. Of course, we do better in each case to represent our images mathematically, but we can still back up that mathematics with a mental picture. With quantum physics, that is impossible."
My problem with this statement is that it is _not_ impossible: pilotwave theory is a counterexample to that claim. Or maybe it would be better to say that it is impossible in orthodox QM, but that's only because orthodox QM is incomplete and fails to clearly specify a physical ontology, with the measurement problem being a symptom of this failure. If QM is completed by a theory specifying a primitive ontology (as in pilotwave or objective collapse theories), or even by some more radical proposal (as in manyworlds), it is not impossible to clearly understand or even visualize what is going on. (Although in manyworlds this can only be done for systems with a very low number of degrees of freedom. It is a bit easier in something like pilotwave theory. There we have the primitive ontology which is directly visualizable, and even the quantum state is more accessible because its role via its influence on the primitive ontology is clear, and because we can write conditional wavefunctions for subsystems in a way that is impossible in other theories.)
So I find it hard to see the statement "If you think you understand quantum mechanics, then you don't", despite its pedigree, as anything other than a cover for a refusal to acknowledge that standard QM has serious philosophical problems. (Though I'm not accusing you of using it that way personally.) "In any philosophy of quantum mechanics there is going to be something unintuitive" is much better, because that is actually true. But then the question becomes, why accept unintitive consequence A (e.g. failure of standard probability axioms) over unintuitive consequence B (e.g. nonlocality)? Particularly when I can at least make sense of B, and even your view already has some of B (via God's causation to make sure the nonlocal correlations engtangled particles hold), and I can't make sense of A?
I appreciate your attempt to provide a picture of what an amplitude as a representation of uncertainty might look like, though it is (forgive the pun) circular reasoning. If we have a set of related propositions, paramaterized by X (mod 2pi), and _only one of those propositions can be true_ (which you indicate is the case), then the proper representation of uncertainty about these propositons seems to be a probability distribution over X, unless it is already given that something like amplitudes can represent uncertainty.
Your argument that we can't do this (treat the state+phase as a single classical state) is that it would give incorrect results if we tried to combine the probabilities of two different states. But... does it? Unless I'm grossly misunderstanding things, a mixed quantum state represented by a density matrix looks exactly like a probability distribution over different quantum states (treated as possible ontological states of the system, and which themselves produce objective tendencies for the system to arrive at different observable states upon measurement), combining in exactly the way one would expect. Nothing that I've read anywhere else indicates "beware, standard axioms of probability theory do not apply here"! You also say, "what other way can we interpret the amplitude in a psiepistemic view of QM?" But why should I accept a psiepistemic view of QM, which I can't understand, over a psiontic view?
Amplitude vs. probability and entanglement, continued
Dr. Cundy: Granted, the order of operations is important when calculating. But I don't think it affects the argument. At least, I don't see how summing over the hidden parameters could introduce a nonlocal correlation, if such correlations are ruled out for each possible value the hidden parameters could take. After all, if each summand obeys Amp(ab,X) = Amp(aX), and each summand has equal weight in the sum, doesn't the sum have to obey the constraint, too? I know there are some counterintuitive results in probability theory, but that would be really peculiar, if true.
Matthew: In the case of interference, the data expressed in the amplitudes for possible paths just is what the "pilot wave" is supposed to convey to the particles. That is, you discover what the pilot wave for your experiment is by summing or integrating the amplitudes for each alternative. And the math for that doesn't respect all the rules of classical probability theory, because you're working with complex numbers, which aren't totally ordered. "P(A or B) > P(A)" can be easily deduced from the axioms of probability theory; "Amp(A or B) > Amp(A)" isn't even welldefined.
Density matrices are the same way. However much a density matrix may look like a probability distribution, it doesn't act like one in all respects, just because its components lack the ordering properties of the real numbers. You can take sums and products of linear operators as much as you like, but it makes no sense to say that one linear operator is greater than another. And the way you get an actual probability out of a density matrix is an operation outside the scope of probability theory.
Response to Michael
Hi Michael, in the case of pilotwave theory, the wavefunction is ontic rather than epistemic, so no problem of how to interpret the amplitudes as representations of uncertainty arises, and there's no surprise that they don't obey probability theory (because they aren't representing uncertainties, but properties that give rise to objective tendencies).
As for density matrices, the way they result in probabilities involves both (a) the probability distribution over the ontic quantum state, and (b) the Born rule for producing the probabilities resulting from that quantum state (the objective tendencies that arise from the quantum state). As far as I can tell, operation (b) is the only one outside of the scope of standard probability theory.
The fact that the quantum states can't be linearly ordered doesn't mean that the density matrix does not represent a probability distribution  you can have a probability distribution over a 2D plane, after all!
Comment on using Born's rule before vs. after summation
Hi Dr. Cundy, you write the following in response to Michael:
***
Born's rule is applied when we have an ensemble of "paths"; the amplitude refers to a single path. Thus the expression should be:
P(AB) =  int dX Amp(ABX)^2
where X represents the various internal variables which we don't know. If there were a single amplitude contributing to the probability, then your expression would be correct. As I stated, the proofs of Bell's theorem that I have seen rely on an integration over the hidden variables after you have calculated the probability, in effect.
P(AB) = int dX  Amp(ABX)^2
***
In my view the difference between these two cases is the following:
In the first case, the quantum state is a superposition of different terms (parameterized by X), and the whole superposition enters into Born's rule to determine the objective tendencies of the system. The whole state int dX Amp(ABX)> is a pure state.
In the second case, we don't know what the quantum state is; it could be one of many different states (parameterized by X). To find the probability we calculate the objective tendencies produced by each state, and average them. Each state Amp(ABX)> is a pure state and we're dealing with a probability distribution over them. If I'm understanding things correctly, we could represent this as a mixed state, something like int dX Amp(ABX)><Amp(ABX).
Probabilities vs. density matrices
The point here is that a probability distribution is, necessarily, a realvalued function. Whatever wild and strange things you pick the inputs from, the output has to be a real number between 0 and 1. Now, even if you write a density matrix as a weighted sum of pure states, it is not a realvalued function  it's a Hermitian matrix. You have to do something else to it to extract a probability (specifically, you take its trace.) And taking the trace of a density matrix is the equivalent of applying Born's rule to an amplitude.
In other words, calling a density matrix a probability distribution is a confusion of ideas. The two things aren't of the same type.
Similarly, "P(AB) =  int dX Amp(ABX)^2" is in fact the correct way to handle mixed states, as well as superpositions  and if you use density matrices, you must integrate over the unknown parameters, then take the trace. The point in the calculation where you apply Born's rule, or take the trace of a density matrix, represents the point in the experiment where you measure the system  or, if you'd rather, where the system interacts with its environment and decoheres. The expression "P(AB) = int dX  Amp(ABX)^2" represents a decoherent system, from which all quantum effects like interference and entanglement have been banished.
Probability distributions vs. density matrices
Michael, that's correct  I've been trying to say that density matrices can be used to _represent_ probability distributions over pure quantum states, not that they _are_ probability distributions over pure quantum states. Maybe I missed making that distinction clearly, but my above comments should be interpreted with it in mind.
Specifically, if you have some quantum states psi_X> parameterized by X, each associated with a probability P(X), then that is represented by the density matrix which is the sum over P(X) psi_X> <psi_X, is it not? Then the expectation value of some observable A is the sum over P(X) <psi_XApsi_X>, which corresponds to the second calculation (summing after squaring) rather than the first (summing before squaring). That's according to the Wikipedia page on density matrices, at least.
My point is that taking the trace with the density matrix corresponds to _both_ of the operations (a) averaging over the probability distribution and (b) applying Born's rule on the pure quantum states. I don't believe I've said anything incorrect here.
Probability distributions vs. density matrices
I think you've misunderstood the Wikipedia page. You could think of a mixed state as a function from pure states to probabilities, I suppose, sending each pure state to its coefficent. But I don't see how that would be useful, either for calculations or conceptually. The mathematical object you work with is the density matrix itself, not that "function". (For instance, while the density matrix for a mixed state specifies that state uniquely, its representation as a convex combination of pure states very much doesn't. There are lots of combinations of pure states that end up as the same matrix.)
So that derivation of the expectation value of an observable is a purely formal algebraic calculation. The components p_{X}psi_{X}><psi_{X}A have no physical meaning  they are used only to prove that tr(A*psi) is the expectation value of A in the state psi. Redefining the decomposition of the mixed state's matrix (a purely mental operation) as the average of a "probability distribution", just because it looks like one, is chasing after a mirage. It doesn't help the calculation, and it doesn't correspond to anything in the physics.
In short, taking the trace of a density matrix is just applying Born's rule. It doesn't involve averaging over a probability distribution, because there is no actual distribution to average over.
Response to Michael on density matrices
I know that the density matrix itself is what is used to do calculations, and that it is more useful for that purpose than a probability distribution over pure states. I also know that different combinations of pure states can result in the same density matrix. For a psiontic theory, that simply represents an epistemic limitation  from experimentally observed probabilities, we can't always decompose that into probabilities resulting from quantum indeterminism, and probabilities resulting from uncertainty about the actual quantum state.
You seem to be arguing from the fact that we can't uniquely extract a probability distribution over the pure quantum states that there isn't one there at all. But that seems to me to be simply untrue: you could set up a device that randomly selected a (pure) quantum state and emitted an electron in that state. An experimenter not privy to the programming of your apparatus might only be able to determine the density matrix by doing measurements on the electrons, but that doesn't prevent there from being (or you from knowing) the actual probability distribution.
So I think it is making some assumptions about how QM should be interpreted to say that "taking the trace of a density matrix is just applying Born's rule" and nothing else.
Probability distributions vs. density matrices, continued
"you could set up a device that randomly selected a (pure) quantum state and emitted an electron in that state."
Well, no, actually, I don't think you could. Consider, instead, the case of polarized light. The maximally mixed state is light that isn't polarized at all. And you can't create unpolarized light with a machine that emits either horizontally or vertically polarized light based on the flip of a coin (or any other random process.) You could split a beam of light into oppositely polarized halves, but each beam remains polarized whatever you do, and the only way to get unpolarized light out of them is to put them back together. That is, a beam of light that's known to be polarized either horizontally or vertically, with a 1/2 probability of either, is not the same thing as an unpolarized beam.
And yes, I do say that because the decomposition of a mixed state's density matrix into a convex combination of pure states' density matrices isn't unique, it's incoherent to recast that combination as "the average of a probability distribution" and suppose that the distribution means anything. It's just like deciding where to put the origin of your coordinate axes  you're deciding how to represent the physical system you're looking at, and you can pick whatever is most convenient, because that part of the representation doesn't correspond to anything real. How you decompose a mixed state's density matrix depends on what basis in the Hilbert space is most convenient for your calculations; it's an artifact of your representation, not a reflection of what's really happening.
The contrary view would be like saying that, because we can do physics under the assumption that the Earth is the nonrotating center of the universe and everything revolves around it, therefore the Earth really is just that. Which would be absurd.
Response to Michael on probabilistic mixtures
On probabilistic mixtures: here is a setup which, roughly speaking, randomly selects between spin up and spin down and then emits an electron in that state. Start with an unpolarized beam of electrons with low enough intensity that roughly one electron is emitted every second. Pass it through a SG device that sends spin up electrons down one path and spin down electrons down the other path before recombining them. Each second randomly (with equal probability) choose one of the paths to be blocked for that second. Electrons which exit the setup will be either spin up or spin down with roughly equal probability. You can do something similar with photons, if you like.
In any case, since you think I misunderstood the Wiki article, I looked up some online material from introductory QM courses, and the first two I found (one from Berkeley and one from MIT) were even more clear than Wikipedia: they defined mixed states as probability distributions over pure states, said that density matrices represent mixed states, and one of them directly stated that the same density matrix can represent different mixed states. The third one that I found took your position, equating mixed states with (nonpure) density matrices, and noting that because we can decompose the density matrix in many ways, we can't call the coefficients probabilities. But it went on to say that we can certainly go the other way: probabilistically prepare a statistical mixture of pure states and write a density matrix with those probabilities as coefficients.
So that certainly supports part of what I've been saying: that you can represent a statistical mixture of pure states as a density matrix. The only other thing I'm claiming here is that it is possible that every density matrix represents such a statistical mixture, even if we cannot in principle determine which statistical mixture it is. (In fact, I've confirmed that under a Bohmian interpretation this is precisely the case, hence my assertion that your claim to the contrary brings in matters of the interpretation of QM.)
I believe your comparison with choosing a coordinate system is disanalogous, because I'm not talking about going from a density matrix to a probability distribution, but the other way around. There's a loss of information when you write the density matrix from the probability distribution. The fact that you can't uniquely recreate the probability distribution doesn't imply that there wasn't one in the first place. In your supposed analogy, we have very good reason to think that there was no originally correct coordinate system (namely, coordinate systems are human constructs). There isn't an analogous reason for us to expect that a mixed state doesn't come from some statistical mixture of pure states.
Symmetry considerations
I would say that the main reason we have for thinking that coordinate systems are "human constructs" is that the best theories of physics we have are symmetric under a change of coordinates. If, instead, our physics claimed that the results of experiments depended on their distance from a specific point, and the experiments confirmed that claim, that would be strong evidence that the universe has a Godgiven coordinate system.
By analogy, then, since the Lagrangian of quantum field theory is symmetric under a change of basis in the Hilbert space from which the wavefunction/density matrix that represents a state is chosen, you would need some other reason (such as prior knowledge of how the state was made) to suppose that the choice of basis is physically significant. That is, the symmetry of the "probability distribution" representation of a mixed state under a change of basis is, all by itself, good reason not to believe that a mixed state is a statistical mixture of pure states. The presumption cuts the opposite way from how you want it to.
After all, the whole reason the density matrix representation works is that the information discarded by calculating a density matrix from a probability distribution makes no difference to the outcome of any observation we can make. A distinction that makes no difference is no difference.
As an aside, this argument is quite similar to Dr. Cundy's argument that the symmetry of our physical theories under Lorentz transformations is a good reason to disbelieve presentism and accept a Btheory of time. In that case, though, I do have evidence the other way, such as our own experience of time, which is limited to a single moment rather than extending over all our lifetimes.
Rather than dispute with you about what the symmetry considerations do or do not demonstrate, I will simply claim we are in the same boat with the status of density matrices as the one you and I seem to agree we are in with respect to Lorentz symmetry and the A vs. B theory of time. Namely, orthodox quantum theory is incomplete because it fails to specify a clear physical ontology (which is most evident via the measurement problem), and what seem to me to be the most reasonable ways to resolve that problem give us good grounds for thinking that density matrices do in fact represent statistical mixtures of pure states. (E.g. pilotwave type theories, because we don't know the precise state of the primitive ontology alongside the quantum state, and objective collapse theories, because we don't know exactly when the collapse occurs or what the quantum state collapses to, both lead to a natural interpretation of the density matrices arising from partial traces of larger systems in terms of a probability distribution over pure states of the subsystem.)
Response to Pilot Wave Article
I have posted an initial draft of a response to the first part of Matthew's article on pilot wave interpretations of QFT at http://www.quantumthomist.co.uk/Documents/PilotWaveModelsResponse.pdf. I have only had a chance to respond to the first part of the article, and this is very much a first draft  I haven't read it through myself. I will try to update and improve the document as time allows. Comments and criticisms very welcome.
Summing conditional amplitudes
Dr. Cundy, my question to you probably got lost in the argument I had with Matthew, so...
If the amplitudes for every state a pair of entangled particles can be in don't have nonlocal correlations, is it possible for the process of summing the alternatives  that is, of taking the path integral  to introduce a nonlocal correlation in the final result? My intuition says "no", but intuition can be wrong.
Summing conditional amplitudes
Michael,
Sorry for not responding to your question. It is a good one.
Firstly, my reply (Comment 15) was in three parts, and my comment about summing amplitudes was only the first part of it. The third paragraph is probably most important part of my reply. Note also that I had the qualification: I am not sure that this niggle overly affects your argument. Which would imply even as I wrote the comment I agreed with you. I emphasised that we should be summing amplitudes rather than probabilities because I do think that this is crucial to assessing the mathematical proof behind Bell's inequalities. You might well be right that some form of Bell's theorem survives this objection; but the standard proofs have a problem.
Can summing local amplitudes lead to a nonlocal probability? I can only think that they would if they are conditional on different things. For example, we can consider summing psi(XYa) and psi(XYb). Perhaps this is a two slit interference pattern. The first amplitude represents the likelihood that the particle reaches X given initial conditions Y and that it went through slit a, while the second amplitude represents the likelihood that the particle reaches X given initial conditions Y and passing through slit b. Each of the amplitudes themselves is local, in that it contains no nonlocal interactions, but the sum of them depends on both what happens at a and b. (Of course, these are both in the past light cone of X, so perhaps this isn't the best of examples. But if we can combine amplitudes of two different events, then each one individually might be local, but the probability which is based on the combination not.)
Equally, we shouldn't forget that probabilities are compared against frequency distributions of ensembles, rather than individual events (which is what the experimental tests of Bell's theorem rely on). When considering individual events, we ask questions such as "What is the amplitude that detector a records result A given that detector b records result B?" (Or, if we don't know the result at B, what is the amplitude that we get A and B?) In this sense, Amp(AYB) is not equal to Amp(AY). Is this a break down of locality? Not if the amplitude refers to our knowledge. And with regards to substance causality, no particle or particle state has been created at a distance from its efficient cause, or travelled faster than light. How does the particle at a know what happened at b? It doesn't (after a fashion  in my philosophy God is the one who actualises the potential for it to record a particular spin state, and God is aware of what happens at a). But the result at a is not in general determined by what happens at b. It is only when we consider the ensemble that the pattern emerges.
Michael, I wouldn't call it an argument. More of a discussion!
Dr. Cundy, thanks so much for your comments! I've responded via email to avoid crowding this comment thread.
Matthew
Thanks for your email, Matthew: I have read it, but haven't had time to respond properly, or complete the second part of my review of your work. I'll get on with that when I have the opportunity, but I can't say when that would be.
Post Comment:
All fields are optional
Comments are generally unmoderated, and only represent the views of the person who posted them.
I reserve the right to delete or edit spam messages, obsene language,or personal attacks.
However, that I do not delete such a message does not mean that I approve of the content.
It just means that I am a lazy little bugger who can't be bothered to police his own blog.
Weblinks are only published with moderator approval
Posts with links are only published with moderator approval (provide an email address to allow automatic approval)
Response part 1
Thanks very much for engaging with my comments; I greatly appreciate it! I consider what follows here, as with my comment on the previous post, as somewhat preliminary, despite its length. I'd like to be able to understand your position more fully and respond in a more detailed fashion  both on the subject of this post and on everything else we have discussed  though I don't know when or if I'll get around to it.
I admit I am finding myself a little frustrated at this juncture, both because I am not quite sure what you mean by some of what you write and because I am not quite sure how to articulate what I mean to say in response! But I will try my best and perhaps we will eventually understand each other better.
 NonLocality of QM 
First, you are absolutely right that Bell postulates a probability distribution over the states of the candidate theory to prove his theorem. I was well aware of this, and I realized upon reading this post that I conflated two things in my original comment: Bell's theorem, and the application of Bell's condition of local causality to diagnose quantum mechanics as nonlocal. Your view (which still doesn't make sense to me) that we cannot use probabilities to characterize uncertainty applies to the proof of Bell's theorem, but it is not relevant to demonstrating that quantum theory fails to satisfy Bell's locality criterion. That only requires reference to probabilities of macroscropic setups and outcomes, which Norsen's paper makes clear, I believe.
So quantum theory is nonlocal in Bell's sense. You just dispute that Bell's sense of locality is the appropriate sense, by making the substance vs event causation distinction. I would make that distinction differently than you. I think that modern philosophy made a wrong turn by focusing exclusively on events as causes (In that much, I believe I am in agreement with you). In fact, I don't think events ever serve as causes  rather, substances are causes. Events are the effects. (Sometimes, substances act as causes as a result of or in response to certain events; that's how causal chains get built up.) In certain types of events (namely, events of creation) a substance can also be viewed as the effect; substances would also be appropriately classed as the effects of God's sustaining activity.
My point here is that, from my perspective, the distinction you are trying to draw is selecting a certain subclass of events (creation events, and maybe annihilation events) and saying "only causation of these events needs to be local". That seems adhoc to me. I have a bit more to say on your take of quantum nonlocality below.
 Your View of the BellInequality Experiment 
Here is my impression of your view of the usual Bellinequalitytype experiment (please correct me if I misrepresent it): you hold that the particles are emitted each with some definite, though unknown, spin state, and those states are anticorrelated with each other. This goes beyond orthodox QM, which only assigns an entangled state to the pair of particles, but no definite state to either individual particle (so it isn't exactly "taking the math literally" as you claim). It is indeterministic which definite state is emitted. If I read you right, the definite state determines probabilities for the particle to be measured spin up or spin down along any given axis (by using Born's rule on the appropriate product, e.g. <AY>). (I'll get back to that.) The perfect anticorrelation of measurements along the same axis is not explained by any local causes of the physical substances involved, but instead is guaranteed directly by the sustaining activity of God.
Now, I entirely agree that God upholds the universe in existence and that physics describes his regular way of doing so. But those pertain to our theology and philosophy, not, it seems to me, to physics itself. Translating your explanation into last point into the language of physics, your view is basically that it is a law of physics that the measurements of the particles' spins will always be anticorrelated, even though there is no local cause of this anticorrelation. Okay... this reads to me like a tacit admission that quantum physics is, in fact, nonlocal, even though you want to avoid saying so. (And, theologically speaking, the way you avoid saying so makes your view lean towards occasionalism.)
It also makes one of your criticisms of the deBroglie/Bohm theory sound a bit like special pleading. You seem to be saying that there is no problem with God's causation being nonlocal, because he is immaterial and outside of space and time. Well, the pilot wave theory can say something similar: the pilot wave (which really is just the same thing as the wavefunction/quantum state of orthodox QM by another name) exists in configuration space, certainly outside of space and maybe even outside of time. And though it may be regarded as ontologically real and an object of physics, it isn't material. Particles and the things made from them are what are material; the pilot wave is not.
(I would prefer to regard the pilot wave/quantum state as an abstraction from the forms and causal powers of material things, rather than something with an independent existence, in which case that move doesn't work. But I also don't have a problem with saying that there is genuinely nonlocal physical causation. I have more to say about that; maybe while I'm at this I'll go comment on your Atheory vs Btheory post.)
 Definite Spin States 
Back to the role that the definite spin state of the particle has in your view: as far as I can tell, it doesn't work. Say particle 1 has spin up in the zdirection and particle 2 has spin down. You seem to be saying that the probability for us to measure particle 1 as spin up along some axis at an angle theta to the zaxis is a function of theta (i.e. that it is determined by the definite state of particle 1). But it isn't  if there has been (or even if there will be?) a measurement on particle 2, the probability instead depends entirely on the measurement result obtained for particle 2. And vice versa  it seems that we actually have to say that the probabilities for the measurement results are not at all determined by the actual states of the particles, but they are only jointly and timelessly determined by God's sustaining activity.
Maybe I am misreading you and you don't intend that the definite spin state actually plays any explanatory role in the theory. But this makes one of your other criticisms of the pilotwave theory sound like special pleading. You object that the pilotwave theory has "redundant nonobservable physical objects". I'm not exactly sure by what lights you consider the pilotwave redundant  it is the same object as the wavefunction/quantum state in orthodox QM. Nor am I sure what is objectionable about it being nonobservable  we can't observe fundamental particles either, only their effects on the macroscopic reality that we can observe. But then the pilotwave is at most one level further removed, and it isn't any greater than the leap from (indirectly) observing particles to postulating specific quantum states for those particles. But in any case, the pilotwave is not at all redundant  it plays an important explanatory role. Contrast that to the definite spin states in your view, which seem to play no explanatory role whatsoever.
Maybe the particles must have some spin state or another, due to the requirement of realism (though why not just let them have a definite position, and an indefinite spin state?). In that case, though, I hope you don't fault the pilotwave theory for having a preferred notion of simultaneity, which it arguably brings in for similar considerations as you bring in definite spin states.
This is getting very long, so I will break it up and put my remaining thoughts in another comment.