Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Sunday, May 29, 2016

How do I translate a simple boolean statement to SQL?

How do I translate a simple boolean statement to SQL?


I have the following database table with information about people, diseases, and drugs:

PERSON_T              DISEASE_T               DRUG_T  =========             ==========              ========  PERSON_ID             DISEASE_ID              DRUG_ID  GENDER                PERSON_ID               PERSON_ID  NAME                  DISEASE_START_DATE      DRUG_START_DATE                        DISEASE_END_DATE        DRUG_END_DATE  

From these tables, I run some statistics about which individuals have taken which drugs and had which diseases. From this I can figure out which patterns are interesting for me to delve further into. For instance, below is a simplified example of the boolean pattern I might find for disease 52:

( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR     (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )  

Edit: Here is another example:

( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR         (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )  

Now I want to convert this pattern into a sql query and find all the people who match this pattern.
For example, I want to find all of the people in PERSON_T who had the disease and ((who did not take drug 234 and 26 before exhibiting symptoms, but did take drug 474 before exhibiting symptoms) or (who took drug 371 before exhibiting symptoms, but not drug 791 and 395 before exhibiting symptoms))

How would I go about translating this pattern back into the original query?

Here's my first attempt, but I get stuck on the first term:

SELECT * FROM PERSON_T, DRUG_T, DISEASE_T     WHERE DISEASE_ID = 52 AND       PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND       PERSON_T.PERSON_ID = DRUG_T.PERSON_ID  AND       (DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)  

I need this to work in PostgreSql, but I assume that any given answer can be translated from a given database to PostgreSql.

Response to comments

  1. I fixed formatting of database tables. Thank you.
  2. I need to be able to take an arbitrary boolean statement and translate it to SQL. The boolean statements we are actually creating are much longer than the example I gave. Any new tables I create will be in a new database and need to have the same schema as the original tables. This way to the end user, they can run their same code on the new tables and it work the same as if it ran on the original tables. This is a requirement from the customer. I'm hoping I can create a view which is just a query to the original tables. If we can't get that to work, I may create a copy of the tables and filter the data as I'm copying it over to the new table. We are not using neural networks to do the analysis. We are using our own custom algorithms which scale much better than neural networks.
  3. The Disease_Start_Date is the date when the person gets the diseaes which is likely when the symptons start appearing. Disease_End_Date is when the person is recovered which is likely when symptoms disappear.
  4. Drug_start_date is when the person starts taking the drugs. Drug_end_date is when the person stops taking the drugs.

Edit I added my own answer. Can anyone come up with a simpler answer?

Answer by ajdams for How do I translate a simple boolean statement to SQL?


Pardon any mistakes but I think something like this would work (in T-SQL):

SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(234, 26)
AND drug_id = 474
AND disease_start_date < drug_start_date
UNION
SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(791, 395)
AND drug_id = 371
AND disease_start_date < drug_start_date

Now it doesn't have to be done with a UNION but for readibility I thought this was the easiest given your conditions. Maybe this will lead you in the right direction.

Answer by Peter Hanneman for How do I translate a simple boolean statement to SQL?


SELECT per.person_id, per.name, per.gender  FROM person_t per  INNER JOIN disease_t dis  USING (person_id)  INNER JOIN drug_t drug  USING (person_id)  WHERE dis.disease_id = 52 AND drug.drug_start_date < dis.disease_start_date AND ((drug.drug_id IN (234, 474) AND drug.drug_id NOT IN (26)) OR (drug.drug_id IN (371) AND drug.drug_id NOT IN (395, 791)));  

This will do what you are asking for. The IN statements at the end are pretty self explanatory.

Answer by Justin K for How do I translate a simple boolean statement to SQL?


To me, the straightforward (if ugly) solution is to use EXISTS and NOT EXISTS clauses:

SELECT *  FROM PERSON_T INNER JOIN DISEASE_T       USING (PERSON_ID)  WHERE DISEASE_ID = 52    AND EXISTS (SELECT 1 FROM DRUG_T                WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID                  AND DRUG_ID = 474                  AND [time condition])    AND NOT EXISTS (SELECT 1 FROM DRUG_T                WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID                  AND DRUG_ID = 234                  AND [time condition])  

...and so on. In the example, we're asking for people who have taken drug 474 but not 234. Obviously, you can group the clauses with ANDs and ORs according to what you need.

Aside: I find all caps difficult to read. I usually use uppercase for SQL keywords and lowercase for table and column names.

Answer by Jay Askren for How do I translate a simple boolean statement to SQL?


None of the answers given seem to work. Again here is the pattern I want to implement: ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

I believe the following query will work for (Drug 234 = true AND Drug 474 = true AND Drug 26 = false). Given that, it is pretty easy to add the second half of the query.

SELECT  p.person_id, p.gender FROM person_t as p       join drug_t as dr on dr.person_id = p.person_id       join disease_t as ds on ds.person_id=p.person_id       WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=234  INTERSECT  SELECT  p.person_id, p.gender FROM person_t as p       join drug_t as dr on dr.person_id = p.person_id       join disease_t as ds on ds.person_id=p.person_id       WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=474  INTERSECT (  SELECT p.person_id, p.gender      FROM person_t as p       JOIN disease_t as ds on ds.person_id = p.person_id       LEFT JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26      WHERE disease_id = 52 AND dr.person_id is null   UNION   SELECT p.person_id, p.gender      FROM person_t as p       JOIN disease_t as ds on ds.person_id = p.person_id       JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26      WHERE disease_id = 52 AND dr.drug_start_date > ds.disease_start_date)  

This query works, but is quite ugly. I also suspect it will be extremely slow once I have a production database with 100 million people. Is there anything I can do to simplify/optimize this query?

Answer by sgriffinusa for How do I translate a simple boolean statement to SQL?


I have no idea how this will perform with large tables (I imagine it will be pretty lousy as date comparisons are typically pretty expensive), but here is a method that should work. It is relatively verbose, but is very easy to modify for different boolean cases.

Example 1:

SELECT dis.*  FROM disease_t dis  LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234  LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474  LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26  LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395  LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791  LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371  WHERE dis.disease_id = 52  AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND        (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND        (d3.person_id IS NULL OR dis.startdate < d3.startdate))        OR       ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND        (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND        (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))  

Example 2:

SELECT dis.*  FROM disease_t dis  LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234  LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474  LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26  LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395  LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791  LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371  WHERE dis.disease_id = 52  AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND        (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND        (d3.person_id IS NULL OR dis.startdate < d3.startdate))        or       ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND        (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND        (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))  

Answer by pferate for How do I translate a simple boolean statement to SQL?


Here is a query that handles ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) ), as you posted.

/*  -- AS DEFINED BY JOINS  -- All "person_id"'s match  -- Drug 1 is not Drug 2  -- Drug 1 is not Drug 3  -- Drug 2 is not Drug 3  -- All Drugs are
optional as far as the SELECT statement is concerned (left join) -- Drug IDs will be defined in the WHERE clause -- All Diseases for "person_id" -- AS DEFINED IN WHERE STATEMENT -- Disease IS 52 -- AND ONE OF THE FOLLOWING: -- 1) Disease started AFTER Drug 1 -- Disease started AFTER Drug 2 -- Drug 1 IS 234 -- Drug 2 IS 474 -- Drug 3 IS NOT 26 (AND NOT 234 or 474, as defined in JOINs) -- 2) Disease started AFTER Drug 3 -- Drug 1 IS NOT 395 -- Drug 2 IS NOT 791 -- Drug 3 IS 371 */ SELECT p.person_id, p.gender FROM person_t as p LEFT JOIN drug_t AS dr1 ON (p.person_id = dr1.person_id) LEFT JOIN drug_t AS dr2 ON (p.person_id = dr2.person_id AND dr1.drug_id != dr2.drug_id) LEFT JOIN drug_t AS dr3 ON (p.person_id = dr3.person_id AND dr1.drug_id != dr3.drug_id AND dr2.drug_id != dr3.drug_id) JOIN disease_t AS ds ON (p.person_id = ds.person_id) WHERE ds.disease_id = 52 AND ( ( (dr1.drug_start_date < ds.disease_start_date AND dr2.drug_start_date < ds.disease_start_date) AND (dr1.drug_id = 234 AND dr2.drug_id = 474 AND dr3.drug_id != 26) ) OR ( (dr3.drug_start_date < ds.disease_start_date) AND (dr1.drug_id != 395 AND dr2.drug_id != 791 AND dr3.drug_id = 371) ) )

Answer by joelt for How do I translate a simple boolean statement to SQL?


I don't have test data real handy to try this out on, but I think you could do something like:

SELECT *  FROM DISEASE_T D  INNER JOIN DRUG_T DR ON D.PERSON_ID = DR.PERSON_ID AND D.DRUG_ID=52  INNER JOIN PERSON_T P ON P.PERSON_ID = D.PERSON_ID  GROUP BY PERSON_ID  HAVING SUM(      CASE WHEN DRUG_ID=234 AND DRUG_START_DATE

The case I know will fail there is if you have multiple records for the same person and the same drug/disease in the drug/disease tables. If that's the case, you could also change the HAVING clause to look more like:

(SUM(CASE WHEN DRUG_ID=234 AND DRUG_START_DATE 0  AND SUM(CASE WHEN DRUG_ID=26 AND DRUG_START_DATE 0)  

Answer by Jeffrey L Whitledge for How do I translate a simple boolean statement to SQL?


I would probably approach this problem from some direction similar to this. It's pretty flexible.

DRUG_DISEASE_CORRELATION_QUERY  ===============================  DRUG_DISEASE_CORRELATION_QUERY_ID  DISEASE_ID  DESCRIPTION    (1, 52, 'What this query does.')  (2, 52, 'Add some more results.')    DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG  ===========================================  DRUG_DISEASE_CORRELATION_QUERY_ID  DRUG_ID    (1, 234)  (1, 474)  (2, 371)    DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG  ===========================================  DRUG_DISEASE_CORRELATION_QUERY_ID  DRUG_ID    (1, 26)  (2, 395)  (2, 791)        CREATE VIEW DRUG_DISEASE_CORRELATION  AS  SELECT       p.*,      q.DRUG_DISEASE_CORRELATION_QUERY_ID  FROM       DRUG_DISEASE_CORRELATION_QUERY q      INNER JOIN DISEASE_T ds on ds.DISEASE_ID = q.DISEASE_ID      INNER JOIN PERSON_T p ON p.PERSON_ID = ds.PERSON_ID    WHERE       AND EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN          (SELECT qid.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG qid WHERE           qid.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)          AND DRUG_START_DATE < ds.DISEASE_START_DATE)     AND NOT EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN          (SELECT qed.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG qed WHERE           qed.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)          AND DRUG_START_DATE < ds.DISEASE_START_DATE)  GO      SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 1  UNION  SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 2  

Answer by Philip Kelley for How do I translate a simple boolean statement to SQL?


If I have it right, you want to:

  • Select out those Persons
  • Who have been infected with one (1) specific disease
  • Who have been treated with one or more specified drugs
  • And who have NOT been been treated with one or more specified other drugs

This could be simplified by converting your "drug requirements" into a temporary table of some form. This would allow for the use of any number of "good" and "bad" drugs to query against. What I have below could be implemented as a stored procedure, but if that's not an option a number of convoluted options are available.

Breaking down the steps:

First, here's how the desired patients are selected. We'll use this as a subquery later:

SELECT [PersonData]   from DISEASE_T di    inner join PERSON_T pe     on pe.Person_Id = di.Person_Id   where di.Disease_Id = [TargetDisease]    and [TimeConstraints]  

Second, for each set of "target" drugs that you have ANDed together, set up a temporary table like so (this is SQL Server syntax, Postgres should have something similar):

CREATE TABLE #DrugSet   (     Drug_Id  [KeyDataType]    ,Include  int   not null   )  

populate it with one row for every drug you are considering:

  • Drug_Id = the drug you're checking
  • Include = 1 if the person is to have taken the drug, and 0 if they are not to have taken it

and calculate two values:

@GoodDrugs, the number of drugs you want the patient to have taken
@BadDrugs, the number of drugs you want the patient to have not taken

Now, stitch all the above together in the following query:

SELECT pe.[PersonData]  --  All the desired columns from PERSON_T and elsewhere   from DRUG_T dr    --  Filter to only include "persons of interest"    inner join (select [PersonData]                 from DISEASE_T di                  inner join PERSON_T pe                   on pe.Person_Id = di.Person_Id                 where di.Disease_Id = [TargetDisease]                  and [TimeConstraints]) pe     on pe.Person_Id = dr.Person_ID   --  Join with any of the drugs we are intersted in   left outer join #DrugSet ta      on ta.Drug_Id = dr.Drug_Id   group by pe.[PersonData]  --  Same as in the SELECT clause   having sum(case ta.Include                when 1 then 1  --  This patient has been given a drug that we're looking to match                else 0         --  This patient has not been given this drug (catches NULLs, too)              end) = @GoodDrugs    and  sum(case ta.Include                when 0 then 1  --  This patient has been given this drug that we're NOT looking to match                else 0         --  This patient has not been given this drug (catches NULLs, too)              end) = @BadDrugs  

I've intentionally ignored the time criteria as you didn't go into details on them, but they should be fairly simple to add (though I hope that's not famous last words). Further optimizations may be possible, but a lot depends on the data and other possible criteria.

You would need to run this once for each "drug set" (that is, sets of TRUE or FALSE drugs ANDed together), concatenating the list with each pass. You could probably expand #DrugSet to factor in each drug set you are checking, but I'm reluctant to try and code that without some serious data to test it against.

*/

Answer by Darknight for How do I translate a simple boolean statement to SQL?


I've tried to break down the problem, and follow through as logically as I could.

Firstly the three tables (Person_T, Drugs_T, Disease_T) can be thought of as shown in Fig 1.0:

A Person can have multiple drugs and multiple diseases. Each drug and disease has a start date & end date.

Therfore I would first de-normalize the three tables into one table (Table_dn) thus:

dnId | PersonId | DrugId | DiseaseId | DgSt | DgEn | DiSt | DiEn  ----   --------   ------   ---------   ----   ----   ----   ----  

This de-normalized table can be a temp table if needed, regardless Table_dn now contains all the entire Global dataset as shown in Fig 2.0 (denoted as G).

From my understanding of your description, I can see essentially a two layer filter.

Filter 1

This filter is simply a boolean set of drugs Combinations, as you have already stated in your question description. eg:

(drug a = 1 & drug b = 0 & etc) OR (.....  

Filter 2

This filter is a little more complicated than the first, it is date range criteria. Fig 3.0 shows this date range in RED. The yellow represents record dates that span in multiple ways:

  • before RED period
  • After RED period
  • Between RED period
  • Ending before end of RED period
  • Starting after start of RED period

Now the YELLOW date periods could be the drugs period or disease period OR combination of both.

This filter should be applied to the set of results obtained from the first results.

Of course depending on your exact question those two Filters may need to go the other way round (e.g f2 first then f1).

SQL pseudo code:

Select sub.*  From            (select    *          from      Table_dn          where     [Filter 1]        ) as sub    where [Filter 2]  

alt text


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

Related Posts:

0 comments:

Post a Comment

Popular Posts

Fun Page

Powered by Blogger.