How do I translate a simple boolean statement to SQL?
How do I translate a simple boolean statement to SQL?
I have the following database table with information about people, diseases, and drugs:
PERSON_T DISEASE_T DRUG_T ========= ========== ======== PERSON_ID DISEASE_ID DRUG_ID GENDER PERSON_ID PERSON_ID NAME DISEASE_START_DATE DRUG_START_DATE DISEASE_END_DATE DRUG_END_DATE
From these tables, I run some statistics about which individuals have taken which drugs and had which diseases. From this I can figure out which patterns are interesting for me to delve further into. For instance, below is a simplified example of the boolean pattern I might find for disease 52:
( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Edit: Here is another example:
( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Now I want to convert this pattern into a sql query and find all the people who match this pattern.
For example, I want to find all of the people in PERSON_T who had the disease and ((who did not take drug 234 and 26 before exhibiting symptoms, but did take drug 474 before exhibiting symptoms) or (who took drug 371 before exhibiting symptoms, but not drug 791 and 395 before exhibiting symptoms))
How would I go about translating this pattern back into the original query?
Here's my first attempt, but I get stuck on the first term:
SELECT * FROM PERSON_T, DRUG_T, DISEASE_T WHERE DISEASE_ID = 52 AND PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND PERSON_T.PERSON_ID = DRUG_T.PERSON_ID AND (DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)
I need this to work in PostgreSql, but I assume that any given answer can be translated from a given database to PostgreSql.
Response to comments
- I fixed formatting of database tables. Thank you.
- I need to be able to take an arbitrary boolean statement and translate it to SQL. The boolean statements we are actually creating are much longer than the example I gave. Any new tables I create will be in a new database and need to have the same schema as the original tables. This way to the end user, they can run their same code on the new tables and it work the same as if it ran on the original tables. This is a requirement from the customer. I'm hoping I can create a view which is just a query to the original tables. If we can't get that to work, I may create a copy of the tables and filter the data as I'm copying it over to the new table. We are not using neural networks to do the analysis. We are using our own custom algorithms which scale much better than neural networks.
- The Disease_Start_Date is the date when the person gets the diseaes which is likely when the symptons start appearing. Disease_End_Date is when the person is recovered which is likely when symptoms disappear.
- Drug_start_date is when the person starts taking the drugs. Drug_end_date is when the person stops taking the drugs.
Edit I added my own answer. Can anyone come up with a simpler answer?
Answer by ajdams for How do I translate a simple boolean statement to SQL?
Pardon any mistakes but I think something like this would work (in T-SQL):
SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(234, 26)
AND drug_id = 474
AND disease_start_date < drug_start_date
UNION
SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(791, 395)
AND drug_id = 371
AND disease_start_date < drug_start_date
Now it doesn't have to be done with a UNION but for readibility I thought this was the easiest given your conditions. Maybe this will lead you in the right direction.
Answer by Peter Hanneman for How do I translate a simple boolean statement to SQL?
SELECT per.person_id, per.name, per.gender FROM person_t per INNER JOIN disease_t dis USING (person_id) INNER JOIN drug_t drug USING (person_id) WHERE dis.disease_id = 52 AND drug.drug_start_date < dis.disease_start_date AND ((drug.drug_id IN (234, 474) AND drug.drug_id NOT IN (26)) OR (drug.drug_id IN (371) AND drug.drug_id NOT IN (395, 791)));
This will do what you are asking for. The IN statements at the end are pretty self explanatory.
Answer by Justin K for How do I translate a simple boolean statement to SQL?
To me, the straightforward (if ugly) solution is to use EXISTS and NOT EXISTS clauses:
SELECT * FROM PERSON_T INNER JOIN DISEASE_T USING (PERSON_ID) WHERE DISEASE_ID = 52 AND EXISTS (SELECT 1 FROM DRUG_T WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID AND DRUG_ID = 474 AND [time condition]) AND NOT EXISTS (SELECT 1 FROM DRUG_T WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID AND DRUG_ID = 234 AND [time condition])
...and so on. In the example, we're asking for people who have taken drug 474 but not 234. Obviously, you can group the clauses with ANDs and ORs according to what you need.
Aside: I find all caps difficult to read. I usually use uppercase for SQL keywords and lowercase for table and column names.
Answer by Jay Askren for How do I translate a simple boolean statement to SQL?
None of the answers given seem to work. Again here is the pattern I want to implement: ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
I believe the following query will work for (Drug 234 = true AND Drug 474 = true AND Drug 26 = false). Given that, it is pretty easy to add the second half of the query.
SELECT p.person_id, p.gender FROM person_t as p join drug_t as dr on dr.person_id = p.person_id join disease_t as ds on ds.person_id=p.person_id WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=234 INTERSECT SELECT p.person_id, p.gender FROM person_t as p join drug_t as dr on dr.person_id = p.person_id join disease_t as ds on ds.person_id=p.person_id WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=474 INTERSECT ( SELECT p.person_id, p.gender FROM person_t as p JOIN disease_t as ds on ds.person_id = p.person_id LEFT JOIN drug_t as dr ON dr.person_id = p.person_id AND dr.drug_id = 26 WHERE disease_id = 52 AND dr.person_id is null UNION SELECT p.person_id, p.gender FROM person_t as p JOIN disease_t as ds on ds.person_id = p.person_id JOIN drug_t as dr ON dr.person_id = p.person_id AND dr.drug_id = 26 WHERE disease_id = 52 AND dr.drug_start_date > ds.disease_start_date)
This query works, but is quite ugly. I also suspect it will be extremely slow once I have a production database with 100 million people. Is there anything I can do to simplify/optimize this query?
Answer by sgriffinusa for How do I translate a simple boolean statement to SQL?
I have no idea how this will perform with large tables (I imagine it will be pretty lousy as date comparisons are typically pretty expensive), but here is a method that should work. It is relatively verbose, but is very easy to modify for different boolean cases.
Example 1:
SELECT dis.* FROM disease_t dis LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234 LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474 LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26 LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395 LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791 LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371 WHERE dis.disease_id = 52 AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND (d3.person_id IS NULL OR dis.startdate < d3.startdate)) OR ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
Example 2:
SELECT dis.* FROM disease_t dis LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234 LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474 LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26 LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395 LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791 LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371 WHERE dis.disease_id = 52 AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND (d3.person_id IS NULL OR dis.startdate < d3.startdate)) or ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
Answer by pferate for How do I translate a simple boolean statement to SQL?
Here is a query that handles ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
, as you posted.
/* -- AS DEFINED BY JOINS -- All "person_id"'s match -- Drug 1 is not Drug 2 -- Drug 1 is not Drug 3 -- Drug 2 is not Drug 3 -- All Drugs are
Answer by joelt for How do I translate a simple boolean statement to SQL?
I don't have test data real handy to try this out on, but I think you could do something like:
SELECT * FROM DISEASE_T D INNER JOIN DRUG_T DR ON D.PERSON_ID = DR.PERSON_ID AND D.DRUG_ID=52 INNER JOIN PERSON_T P ON P.PERSON_ID = D.PERSON_ID GROUP BY PERSON_ID HAVING SUM( CASE WHEN DRUG_ID=234 AND DRUG_START_DATE
The case I know will fail there is if you have multiple records for the same person and the same drug/disease in the drug/disease tables. If that's the case, you could also change the HAVING clause to look more like:
(SUM(CASE WHEN DRUG_ID=234 AND DRUG_START_DATE 0 AND SUM(CASE WHEN DRUG_ID=26 AND DRUG_START_DATE 0)
Answer by Jeffrey L Whitledge for How do I translate a simple boolean statement to SQL?
I would probably approach this problem from some direction similar to this. It's pretty flexible.
DRUG_DISEASE_CORRELATION_QUERY =============================== DRUG_DISEASE_CORRELATION_QUERY_ID DISEASE_ID DESCRIPTION (1, 52, 'What this query does.') (2, 52, 'Add some more results.') DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG =========================================== DRUG_DISEASE_CORRELATION_QUERY_ID DRUG_ID (1, 234) (1, 474) (2, 371) DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG =========================================== DRUG_DISEASE_CORRELATION_QUERY_ID DRUG_ID (1, 26) (2, 395) (2, 791) CREATE VIEW DRUG_DISEASE_CORRELATION AS SELECT p.*, q.DRUG_DISEASE_CORRELATION_QUERY_ID FROM DRUG_DISEASE_CORRELATION_QUERY q INNER JOIN DISEASE_T ds on ds.DISEASE_ID = q.DISEASE_ID INNER JOIN PERSON_T p ON p.PERSON_ID = ds.PERSON_ID WHERE AND EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN (SELECT qid.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG qid WHERE qid.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID) AND DRUG_START_DATE < ds.DISEASE_START_DATE) AND NOT EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN (SELECT qed.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG qed WHERE qed.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID) AND DRUG_START_DATE < ds.DISEASE_START_DATE) GO SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 1 UNION SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 2
Answer by Philip Kelley for How do I translate a simple boolean statement to SQL?
If I have it right, you want to:
- Select out those Persons
- Who have been infected with one (1) specific disease
- Who have been treated with one or more specified drugs
- And who have NOT been been treated with one or more specified other drugs
This could be simplified by converting your "drug requirements" into a temporary table of some form. This would allow for the use of any number of "good" and "bad" drugs to query against. What I have below could be implemented as a stored procedure, but if that's not an option a number of convoluted options are available.
Breaking down the steps:
First, here's how the desired patients are selected. We'll use this as a subquery later:
SELECT [PersonData] from DISEASE_T di inner join PERSON_T pe on pe.Person_Id = di.Person_Id where di.Disease_Id = [TargetDisease] and [TimeConstraints]
Second, for each set of "target" drugs that you have ANDed together, set up a temporary table like so (this is SQL Server syntax, Postgres should have something similar):
CREATE TABLE #DrugSet ( Drug_Id [KeyDataType] ,Include int not null )
populate it with one row for every drug you are considering:
- Drug_Id = the drug you're checking
- Include = 1 if the person is to have taken the drug, and 0 if they are not to have taken it
and calculate two values:
@GoodDrugs, the number of drugs you want the patient to have taken
@BadDrugs, the number of drugs you want the patient to have not taken
Now, stitch all the above together in the following query:
SELECT pe.[PersonData] -- All the desired columns from PERSON_T and elsewhere from DRUG_T dr -- Filter to only include "persons of interest" inner join (select [PersonData] from DISEASE_T di inner join PERSON_T pe on pe.Person_Id = di.Person_Id where di.Disease_Id = [TargetDisease] and [TimeConstraints]) pe on pe.Person_Id = dr.Person_ID -- Join with any of the drugs we are intersted in left outer join #DrugSet ta on ta.Drug_Id = dr.Drug_Id group by pe.[PersonData] -- Same as in the SELECT clause having sum(case ta.Include when 1 then 1 -- This patient has been given a drug that we're looking to match else 0 -- This patient has not been given this drug (catches NULLs, too) end) = @GoodDrugs and sum(case ta.Include when 0 then 1 -- This patient has been given this drug that we're NOT looking to match else 0 -- This patient has not been given this drug (catches NULLs, too) end) = @BadDrugs
I've intentionally ignored the time criteria as you didn't go into details on them, but they should be fairly simple to add (though I hope that's not famous last words). Further optimizations may be possible, but a lot depends on the data and other possible criteria.
You would need to run this once for each "drug set" (that is, sets of TRUE or FALSE drugs ANDed together), concatenating the list with each pass. You could probably expand #DrugSet to factor in each drug set you are checking, but I'm reluctant to try and code that without some serious data to test it against.
*/
Answer by Darknight for How do I translate a simple boolean statement to SQL?
I've tried to break down the problem, and follow through as logically as I could.
Firstly the three tables (Person_T, Drugs_T, Disease_T) can be thought of as shown in Fig 1.0:
A Person can have multiple drugs and multiple diseases. Each drug and disease has a start date & end date.
Therfore I would first de-normalize the three tables into one table (Table_dn) thus:
dnId | PersonId | DrugId | DiseaseId | DgSt | DgEn | DiSt | DiEn ---- -------- ------ --------- ---- ---- ---- ----
This de-normalized table can be a temp table if needed, regardless Table_dn now contains all the entire Global dataset as shown in Fig 2.0 (denoted as G).
From my understanding of your description, I can see essentially a two layer filter.
Filter 1
This filter is simply a boolean set of drugs Combinations, as you have already stated in your question description. eg:
(drug a = 1 & drug b = 0 & etc) OR (.....
Filter 2
This filter is a little more complicated than the first, it is date range criteria. Fig 3.0 shows this date range in RED. The yellow represents record dates that span in multiple ways:
- before RED period
- After RED period
- Between RED period
- Ending before end of RED period
- Starting after start of RED period
Now the YELLOW date periods could be the drugs period or disease period OR combination of both.
This filter should be applied to the set of results obtained from the first results.
Of course depending on your exact question those two Filters may need to go the other way round (e.g f2 first then f1).
SQL pseudo code:
Select sub.* From (select * from Table_dn where [Filter 1] ) as sub where [Filter 2]

Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72
0 comments:
Post a Comment