Logistic Regression has been used to determine whether a piece of text reflects positive or negative sentiment. Could modification of this technique be used to predict the probability of how well a patient would respond to a drug?
As a thought experiment, could Linear Regression be modified to determine whether a drug regimen would work for a patient (e.g. chemotherapy). With “sentiment analysis”, you match the words in a review with the number of stars in the associated rating (1 or 2 stars is a negative rating; 4 or 5 stars is a positive rating). Next, you convert the review into a matrix (really a sparse matrix) of the counts of the number of times each word appears in each review. Using this data and the associated result (i.e. rating), you can train the model to predict what combination of words reflect the associated sentiment. With large amounts of data, statistical gradient descent works faster and might be applied to such a problem.
Now imagine, if you had for each patient that was given a treatment, the RNA Seq data as to which genes get turned on and in what relative amounts, and what was the outcome of that treatment. Using this data, could you create a logistic regression model to predict whether a treatment could be effective for that patient? The nice thing about Logistic Regression is that not only do you get a prediction of the classification of data, but you also get a probability of that prediction.
Logistic regression assumes a linear relationship between the independent variables and the response variable. If the relationship is complex or nonlinear, more advanced machine learning algorithms like decision trees or neural networks may be more appropriate.