Skip to main content
Previous sectionNext section

Filtering Sources

You can use filters to include or exclude sources supplied to an iKnow query. Often you do not wish to perform a query on all of the loaded sources in the domain. A filter allows you to limit the scope of the query to only those sources that meet the criteria of the filter. A filter selects sources based either on which entities are found in the source, or on some information associated with the source itself (metadata). A filter always includes or excludes an entire source and is specified in each query that uses that filter.

Supported Filters

iKnow supplies a number of predefined filters, and provides facilities to allow users to easily define their own filters.

  • Source Id: the SourceIdFilter and ExternalIdFilter allow you to select sources based on the Id of the source. iKnow assigns these values as part of the source indexing process.

  • Random Sources: the RandomFilter allows you to select a random sample of the sources in a domain. You can specify the sample either as an integer number of sources or as a percentage of the total sources.

  • Sentence Count: the SentenceCountFilter allows you to select sources based on the minimum and/or maximum number of sentences in the source. iKnow counts the sentences in a source as part of the indexing process.

  • Entity Match: iKnow provides several filters that allow you to select sources based on the entities (concepts and relations) found in each source. The DictionaryMatchFilter allows you to filter sources based on the minimum and/or maximum number of dictionary matches to the contents of the source. (This filter replaces the deprecated SimpleMatchFilter.) The DictionaryTermMatchFilter and DictionaryItemMatchFilter allow you to filter sources using the same kind of dictionary matching, but limiting the match set to components of a dictionary, rather than the whole dictionary. These dictionary filters can optionally perform standardized-form matching if the $$$IKPMATSTANDARDIZEDFORM domain parameter is specified.

    The ContainsEntityFilter allows you to filter sources by supplying a list of entities directly (rather than by defining them in a dictionary). Optionally, ContainsEntityFilter can also filter sources by entities similar to the listed entities. The ContainsRelatedEntitiesFilter allows you to filter sources by supplying a list of entities, two or more of which must be related, meaning that they must appear in either the same path (the default) or the same CRC. Optionally, ContainsRelatedEntitiesFilter can also filter sources by related entities similar to the listed entities.

  • Indexed Date Metadata: iKnow automatically provides one metadata field for every source, the DateIndexed field. iKnow assigns this field value as part of the source indexing process. By using the SimpleMetadataFilter, this field allows you to select sources based on the date and time when they were indexed by iKnow.

  • User-Defined Metadata: iKnow allows you to define filters using the SimpleMetadataFilter that select sources based on data values that you have associated with a source.

  • SQL Query: the SqlFilter allows you to select sources based on the results of an SQL query.

You can use GroupFilter to logically combine the results of the filters you define.

Filtering by the ID of the Source

The most basic source filter is used to limit the sources supplied to a query by providing the Source Id or the External Id of each source that you wish to include in the filtered result set.

By External Id

The ExternalIdFilter includes those sources whose external Ids are listed in a %List structure. Any element in this list that is not a valid External Id, or is a duplicate external Id is silently passed over.

The following example filters sources by external Id. The external Id for Aviation.Event sources includes either the word “Accident” or “Incident”; this filter include only the sources whose external Id includes the word “Incident”. It then lists the details of the filtered sources:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
  IF ##class(%iKnow.Configuration).Exists("myconfig") {
         SET cfg=##class(%iKnow.Configuration).Open("myconfig") }
  ELSE { SET cfg=##class(%iKnow.Configuration).%New("myconfig",0,$LISTBUILD("en"),"",1)
         DO cfg.%Save() }
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineExtIdFilter
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,100)
  SET i=1
  SET extlist=$LB("")
  WHILE $DATA(result(i)) {
        SET extId = $LISTGET(result(i),2)
        IF $PIECE(extId,":",3)="Incident" {
        SET extlist=extlist_$LB(extId) }
        SET i=i+1
  }
  SET filt=##class(%iKnow.Filters.ExternalIdFilter).%New(domId,extlist) 
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
ApplyExtIdFilter
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
  WRITE "The Id filter includes ",numSrcFD," sources:",!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,100,filt)
       SET j=1
       WHILE $DATA(result(j)) {
         SET intId = $LISTGET(result(j),1)
         SET extId = $LISTGET(result(j),2)
         WRITE intId," ",extId,!
         SET j=j+1
      }
      WRITE "End of list"
Copy code to clipboard

By Source Id

The SourceIdFilter includes those sources whose source Ids are listed in a %List structure. Source Ids are integers. They can be listed in any order. Any element in this list that is not a valid source Id, or is a duplicate source Id is silently passed over.

The following example takes SQL records as sources and filters in several sources by Source Id. Source Ids in this data set are numbered 1 through 100. The SourceIdFilter specifies five source Ids for inclusion, but only three of these source Ids correspond to records in the table. Therefore, the total filtered source count is 3:

DefineAFilter
  SET srclist=$LB(10,14,74,110,2799)
  SET filt=##class(%iKnow.Filters.SourceIdFilter).%New(domId,srclist)
SourceCounts
  SET numsrc = ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
     WRITE "The ",dname," domain contains ",numsrc," sources",!
 SET numfsrc = ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
     WRITE "Source count after source Id filtering: ",numfsrc
Copy code to clipboard

The iKnow.Filters.Filter class includes the %iKnow.Filters.SourceIdRangeFilter subclass. The following example filters a range of sources by source Id. It returns sources with source Ids 5 through 10, inclusive:

  SET filt=##class(%iKnow.Filters.SourceIdRangeFilter).%New(domId,5,10)
Copy code to clipboard

Filtering a Random Selection of Sources

You can use the %iKnow.Filters.RandomFilter to select a random sample of your sources. A random sample allows you to perform tests on a manageable subset of your sources. It also allows you to divide your sources (or a subset of them) into “training” and “test” sets. You would use the “training” set to define iKnow analytics (dictionary matches, source categories, etc.), then would use the “test” set to determine how well these analytics apply to another set of data. In this way you can avoid “overfitting” the analytics to a particular set of data.

You can specify the size of the random subset in two ways:

  • As a percentage: You specify a percentage (as a fractional number between 0 and 1), and this filter returns the corresponding percentage of the indexed sources in the specified domain (or filtered subset of the domain). For example, a value of “.5” means that 50% of the sources in the domain will be included in the filtered result. Halves are rounded up, so 50% of 5 sources is 3 sources. You specify 100% as “.999” with the appropriate number of fractional digits. This filter selects the requisite number of sources randomly.

  • As an integer: You specify an integer, and this filter returns that number of indexed sources in the specified domain (or filtered subset of the domain). For example, a value of “7” means that 7 of the sources in the domain will be included in the filtered result. This filter selects the specified number of sources randomly.

The following example randomly selects 33% of 50 sources, returning 17 sources. You can run this example repeatedly to demonstrate that different sources are randomly sampled:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
  IF ##class(%iKnow.Configuration).Exists("myconfig") {
         SET cfg=##class(%iKnow.Configuration).Open("myconfig") }
  ELSE { SET cfg=##class(%iKnow.Configuration).%New("myconfig",0,$LISTBUILD("en"),"",1)
         DO cfg.%Save() }
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineAFilter
  SET filt=##class(%iKnow.Filters.RandomFilter).%New(domId,.33)
SampledSourceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
  WRITE "Of these ",numSrcD," sources ",numSrcFD," were sampled:",!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20,filt)
  SET i=1
  WHILE $DATA(result(i)) {
     SET intId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     WRITE "sample #",i," is source ",intId," ",extId,!
     SET i=i+1 } 
     WRITE "End of list"
Copy code to clipboard

The following example filters the sources in the domain by source Id, returning 11 sources. It then supplies this source Id filter when defining a random filter. Thus, the random filter returns 3 of these source-Id-filtered sources. You can run this example repeatedly to demonstrate that different sources are randomly sampled:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
  IF ##class(%iKnow.Configuration).Exists("myconfig") {
         SET cfg=##class(%iKnow.Configuration).Open("myconfig") }
  ELSE { SET cfg=##class(%iKnow.Configuration).%New("myconfig",0,$LISTBUILD("en"),"",1)
         DO cfg.%Save() }
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineSourceIdFilter
  SET srclist=$LB(1,3,5,7,9,11,13,15,17,21,23)
  SET idfilt=##class(%iKnow.Filters.SourceIdFilter).%New(domId,srclist)
SourceCounts
  SET numsrc = ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
     WRITE "The ",dname," domain contains ",numsrc," sources",!
 SET numfsrc = ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,idfilt)
     WRITE "Source count after source Id filtering: ",numfsrc,!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20,idfilt)
  SET i=1
  WHILE $DATA(result(i)) {
     SET intId = $LISTGET(result(i),1)
     WRITE intId," "
     SET i=i+1 } 
     WRITE !,"End of list",! 
DefineRandomFilter
  SET rfilt=##class(%iKnow.Filters.RandomFilter).%New(domId,3,idfilt)
RandomSample
  SET numrsrc=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,rfilt)
  WRITE "From ",numfsrc," sources ",numrsrc," are randomly sampled:",!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20,rfilt)
  SET j=1
  WHILE $DATA(result(j)) {
     SET intId = $LISTGET(result(j),1)
     WRITE intId," "
     SET j=j+1 } 
     WRITE !,"End of list",! 
Copy code to clipboard

Filtering by Number of Sentences

iKnow divides a source text into sentences. The following example filters out sources that contain less than 75 sentences. It returns the total number of sources, then uses the filter to return the total number of sources containing 75 or more sentences, then uses the filter again to return the number of sentences in each of these sources:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
  IF ##class(%iKnow.Configuration).Exists("myconfig") {
         SET cfg=##class(%iKnow.Configuration).Open("myconfig") }
  ELSE { SET cfg=##class(%iKnow.Configuration).%New("myconfig",0,$LISTBUILD("en"),"",1)
         DO cfg.%Save() }
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineAFilter
  SET filt=##class(%iKnow.Filters.SentenceCountFilter).%New(domId)
  SET nsent=75
  DO filt.MinSentenceCountSet(nsent)
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
  WRITE "Of these ",numSrcD," sources ",numSrcFD," contain ",nsent," or more sentences:",!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,50,filt)
  SET i=1
  WHILE $DATA(result(i)) {
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
     SET intId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     WRITE "source ",intId," ",extId
     WRITE " has ",numSentS," sentences",!
     SET i=i+1 } 
     WRITE i-1," sources listed"
Copy code to clipboard

To filter using both minimum and maximum number of sentences, invoke both instance methods. The following filter selects those sources containing between 10 and 25 sentences (inclusive):

MinMaxFilter
  SET min=10
  SET filt=##class(%iKnow.Filters.SentenceCountFilter).%New(domId)
  DO filt.MinSentenceCountSet(min)
  DO filt.MaxSentenceCountSet(min+15)
Copy code to clipboard

Filtering by Entity Match

The following entity filters are provided:

  • ContainsEntityFilter: filters sources using a specified list of entities, at least one of which must appear in the source. Optionally, ContainsEntityFilter can also filter sources by entities similar to the listed entities.

  • ContainsRelatedEntitiesFilter: filters sources using a specified list of entities, two or more of which must be related entities, meaning that they must appear in either the same path (the filter default) or the same CRC. Optionally, ContainsRelatedEntitiesFilter can also filter sources by related entities similar to the listed entities.

  • DictionaryMatchFilter: filters sources using one or more dictionaries containing lists of entities; by default, at least one of these entities must appear in the source. Optionally, a specified minimum number of these entity matches must occur for a source to be selected. Dictionary matching also supports standardized-form matching, if the $$$IKPMATSTANDARDIZEDFORM domain parameter is specified for the current domain.

Filtering by Dictionary Match

The %iKnow.Filters.DictionaryMatchFilter class allows you to select sources based on the contents of one or more user-defined dictionaries. iKnow also supports filtering by matching to a list of dictionary terms (%iKnow.Filters.DictionaryTermMatchFilter) or to a list of dictionary items (%iKnow.Filters.DictionaryItemMatchFilter).

In the following simple example, the second filter parameter specifies that only one dictionary is applied; multiple dictionaries can be specified as elements of a %List. The third parameter is set to the default of 1, which means a single match of any dictionary item selects the source for inclusion. You might wish to set this higher to avoid selecting sources where a single dictionary match may be coincidental rather than significant, due to either a large number of items in the dictionary and/or querying sources that contain a large amount of text. The fourth parameter takes the default (-1), which puts no maximum limit on number of matches. If the max parameter is smaller than the min parameter, all sources are selected by the filter, regardless of dictionary matches. The fifth parameter also takes its default: matching based on count of matches, not match score. The sixth parameter is the ensureMatched flag; here ensureMatched=2, so that instantiating the filter generates static match results which are then used each time the filter in invoked. This is the preferred usage. You would need to set ensureMatched to 1 if a dictionary is modified after the filter is instantiated; ensureMatched=1 takes into account changing dictionary contents by matching before every invocation of the filter. However, use of ensureMatched=1 can result is significantly slower performance.

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
CreateDictionary
  SET dictname="EngineTerms"
  SET dictdesc="A dictionary of aviation engine terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created dictionary ",dictId,!}
PopulateDictionaryItem1
   SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(domId,dictId,
       "engine parts",domId_dictId_1)
     SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
         "piston")
     SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
         "cylinder")
     SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
         "crankshaft")
     SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
         "camshaft")
DefineAFilter
  SET filt=##class(%iKnow.Filters.DictionaryMatchFilter).%New(domId,$LB(dictId),1,,,2)
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
SourcesFilteredByDictionaryMatch
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
  WRITE "Of these ",numSrcD,", ",numSrcFD," match the ",dictname," dictionary:",!!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,50,filt)
  SET i=1
  WHILE $DATA(result(i)) {
     SET intId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     WRITE "dictionary matches ",intId," ",extId,!
     SET i=i+1 }
  WRITE !,i-1," sources included by dictionary match",!
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }
  ELSE    { WRITE "DropDictionary error ",$System.Status.DisplayError(stat) }  
Copy code to clipboard

Filtering by Indexing Date Metadata

Every iKnow source is assigned the DateIndexed metadata field. The value of this field is the date and time that a source was indexed by iKnow, in Coordinated Universal Time format (UTC) represented in $HOROLOG format. This is the same as the $ZTIMESTAMP time, except that DateIndexed does not include fractional seconds.

You can create a filter using DateIndexed to include or exclude sources based on when iKnow loaded the source. You can filter using a specific date and time, or filter for a specific date, which encompass all time values within that date. You can use BETWEEN logic to filter for a range of dates.

The following example filters for sources loaded today:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineAFilter
  SET tday = $PIECE($ZTIMESTAMP,",",1)
  SET filt=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,"DateIndexed","=",tday)
DateIndexedValue
  WRITE "Today is ",$PIECE($ZTIMESTAMP,",",1),!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,50)
  SET i=1
  WHILE $DATA(result(i)) {
     SET srcId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     SET idate = ##class(%iKnow.Queries.MetadataAPI).GetValue(domId,"DateIndexed",extId)
     WRITE "Source ",srcId," was indexed ",idate,!
     SET i=i+1 }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt)
  WRITE "Of these sources ",numSrcFD," were indexed today"
Copy code to clipboard

Filtering by User-defined Metadata

In iKnow, data is the contents of a source that iKnow processes and indexes. In iKnow, metadata can be any data associated with a source that is not iKnow indexed data. You use iKnow metadata to identify iKnow data. A metadata filter uses the value of a metadata field to determine which sources to supply to a query.

Note:

The iKnow definition of “metadata” describes how data is used, not the intrinsic nature of the data. This concept differs somewhat from the way this word is used elsewhere in Caché software.

iKnow provides a default metadata management system that is independent of the query APIs. The %iKnow.Queries.MetadataAPI class and accompanying %iKnow.Filters.SimpleMetadataFilter provide implementations for basic metadata filtering. If you wish to implement a custom Metadata API, you should implement (at least) the %iKnow.Queries.MetadataI interface and register your class as the "MetadataAPI" domain parameter: DO domain.SetParameter("MetadataAPI","Your.Metadata.Class"). The example that follows uses the %iKnow.Filters.SimpleMetadataFilter class.

In Caché SQL, each record of an SQL table constitutes an iKnow source. Through the ProcessList() method (for small numbers of records) or AddListToBatch() method (for large numbers of records), you define the Lister parameters:

  • You define the RowID field as a component of the iKnow external Id. iKnow also generates a source Id for each row as a unique integer; this iKnow source Id is completely independent of the RowId or other SQL identifier values.

  • You define a field (or fields) that contain a string of text as a data field to be indexed as iKnow data.

  • You define a field (or fields) as an iKnow metadata field. iKnow can use the values of this metadata field to select sources for an iKnow query.

Note that it is possible to specify the same field as both one of the data fields and as a metadata field. You can optionally also define metakey fields that correspond to the metadata fields.

This is shown in the following example. The Aviation.Event table contains various fields in addition to the NarrativeFull text field. In this example, InjuriesTotal is used as a metadata field. This metadata field is used in three filters: two equality filters, which filter for InjuriesTotal>2 and InjuriesTotal=3, and a BETWEEN filter that filters for InjuriesTotal between 3 and 5 (inclusive). This example uses the DropData(1) method, because DropData() with no argument does not delete metadata. Also note that the AddField() method must be invoked before listing and loading the data.

#Include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData(1)
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull,InjuriesTotal,InjuriesTotalFatal FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
   SET metaflds=$LB("InjuriesTotal","InjuriesTotalFatal")
AddMetaFields
  SET val=##class(%iKnow.Queries.MetadataAPI).AddField(domId,"InjuriesTotal",
                   $LB("=","<",">","BETWEEN"),$$$MDDTNUMBER)
  SET val=##class(%iKnow.Queries.MetadataAPI).AddField(domId,"InjuriesTotalFatal",
                   $LB("=","<",">","BETWEEN"),$$$MDDTNUMBER)
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds,metaflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
CountSources
   SET numsrc=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
ApplyFilter
  SET filt2=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,"InjuriesTotal",
                    ">",2)
  SET numSrcF2=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt2)
  WRITE "Of these ",numsrc," sources ",numSrcF2," had three or more injuries",!
  SET filt3=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,"InjuriesTotal",
                    "=",3)
  SET numSrcF3=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filt3)
  WRITE "Of these ",numsrc," sources ",numSrcF3," had three injuries",!
  SET filtb=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,"InjuriesTotal",
                    "BETWEEN","3;5")
  SET numSrcFb=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,filtb)
  WRITE "Of these ",numsrc," sources ",numSrcFb," had between 3 and 5 injuries",!
Copy code to clipboard

Metadata Filter Operators

You assign to each filter one or more equality operators. If the filter is matching against a string value, use the “=” equality operator. If the filter is matching against a numeric value, you can use one or more of the following operators: “=”, “<”, “<=”, “>”, “>=”. Equality operators are always specified as quoted string elements in a list structure. Equality operators are matched against a single value. This is shown in the following example:

  SET filt=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,metafldname,"=",today)
Copy code to clipboard

The BETWEEN operator is matched against a parameter string containing a pair of values that are separated by $$$MDVALSEPARATOR (the semicolon character). This is shown in the following example:

  SET filt=##class(%iKnow.Filters.SimpleMetadataFilter).%New(domId,metafldname,
                   "BETWEEN","yesterday;tomorrow")
Copy code to clipboard

Filtering by SQL Query

The %iKnow.Filters.SqlFilter class allows you to select SQL sources based on the results of an SQL query. This query can select on any of the following fields:

  • SourceId: the (internal) Source ID of the sources to be selected.

  • ExternalId: the full External ID of the sources to be selected.

  • IdField and GroupField: the two columns used together as identifiers when adding the sources to the domain: Local Reference (IdField) and Group Name (GroupField). See also %iKnow.Source.SQL.Lister.

Note that these result column names are case-sensitive.

For example, the following filter selects for the SourceId of the 6th source retrieved (in this case SourceId 45):

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
  IF ##class(%iKnow.Configuration).Exists("myconfig") {
         SET cfg=##class(%iKnow.Configuration).Open("myconfig") }
  ELSE { SET cfg=##class(%iKnow.Configuration).%New("myconfig",0,$LISTBUILD("en"),"",1)
         DO cfg.%Save() }
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseListerAndLoader
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
   DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,50)
FilterSources
  SET filter=##class(%iKnow.Filters.SqlFilter).%New(domId,
       "SELECT '"_$LIST(result(6),1)_"' AS SourceId")
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.fresult,domId,0,0,filter)
  WRITE !,"Filtered results:",!
  SET j=1
  WHILE $DATA(fresult(j)) {
    WRITE $LISTTOSTRING(fresult(j)),!
    SET j=j+1 }
Copy code to clipboard

The following filter selects sources by ExternalId:

  SET filter=##class(%iKnow.Filters.SqlFilter).%New(domId,
          "SELECT '"_$LIST(result(1),2)_"' AS ExternalId")
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(domId,0,0,filter)
  ZWRITE result  
Copy code to clipboard

Filter Modes

The FilterMode argument specifies what statistical reprocessing should be performed after applying a filter. If no reprocessing is performed, the filter is applied but the frequency and spread statistics are the values calculated for the item before filtering and the sort sequence remains unchanged. The following are the available filter modes:

FilterMode Integer code Filter? Recalculate Frequency? Recalculate Spread? Re-sort Results?
$$$FILTERONLY 1 YES NO NO NO
$$$FILTERFREQ 3 YES YES NO NO
$$$FILTERSPREAD 5 YES NO YES NO
$$$FILTERALL 7 YES YES YES NO
$$$FILTERFREQANDSORT 11 YES YES NO YES
$$$FILTERSPREADANDSORT 13 YES NO YES YES
$$$FILTERALLANDSORT 15 YES YES YES YES

The default is $$$FILTERONLY.

Refer to Constants in the “iKnow Implementation” chapter for use of $$$ macros.

Using GroupFilter to Combine Multiple Filters

iKnow supplies a GroupFilter class that allows you to supply logic to combine the results of other filters. You must first create a GroupFilter instance that provides a defined logic, and then use the AddSubFilter() method to assign it one or more subfilter objects that are combined according to the GroupFilter logic. In this way you can combine multiple existing filters to select which sources are supplied to an iKnow query.

The simplest GroupFilter logic returns the inverse of a single filter. In the following example, the RandFilt selects 33% of the sources. The GroupFilter defines AND logic with a Negated boolean operator. When applied to a single filter, this logic returns all of the sources not selected by the filter:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineARandomFilter
  SET randfilt=##class(%iKnow.Filters.RandomFilter).%New(domId,.33)
SampledSourceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,randfilt)
  WRITE "From ",numSrcD," sources randfilt sampled ",numSrcFD,!
GroupFilter
    SET grpfilt=##class(%iKnow.Filters.GroupFilter).%New(domId,"AND",1)
    DO grpfilt.AddSubFilter(randfilt)
  SET numSrcGrp=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,grpfilt)
  WRITE "From ",numSrcD," sources grpfilt sampled ",numSrcGrp
Copy code to clipboard

The following example uses a GroupFilter to combine the results of two other filters (in this case, both are random filters). Because the GroupFilter logic is AND, and the Negated=0, the GroupFilter results are those sources that are found in both RandomFilter sets. Because the results of these filters are random, the number of GroupFilter AND results will likely differ each time this example is executed:

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE { SET domoref=##class(%iKnow.Domain).%New(dname)
         DO domoref.%Save()
         GOTO SetEnvironment }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO SetEnvironment }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
SetEnvironment
  SET domId=domoref.Id
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DefineTwoRandomFilters
  SET randfilt1=##class(%iKnow.Filters.RandomFilter).%New(domId,.33)
  SET randfilt2=##class(%iKnow.Filters.RandomFilter).%New(domId,.25)
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The ",dname," domain contains ",numSrcD," sources",!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,randfilt1)
  WRITE "From ",numSrcD," sources randfilt1 sampled ",numSrcFD,!
  SET numSrcFD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,randfilt2)
  WRITE "From ",numSrcD," sources randfilt2 sampled ",numSrcFD,!
GroupFilter
    SET grpfilt=##class(%iKnow.Filters.GroupFilter).%New(domId,"AND",0)
    DO grpfilt.AddSubFilter(randfilt1)
    DO grpfilt.AddSubFilter(randfilt2)
  SET numSrcGrp=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId,grpfilt)
  WRITE "From ",numSrcD," sources grpfilt sampled ",numSrcGrp
Copy code to clipboard

You assign each GroupFilter either the AND ($$$GROUPFILTERAND) or the OR ($$$GROUPFILTEROR) logical operator. Therefore, to create a compound filter involving both AND and OR logic, you must create a GroupFilter with AND logic and a GroupFilter with OR logic.

The following example corresponds to the Boolean expression "(filter1 AND !(filter2 OR filter3))":

#Include %IKPublic
   SET domoref=##class(%iKnow.Domain).%New("MyDomain")
   DO domoref.%Save()
   SET domId=domoref.Id
   /* . . . */
Create3Filters
    /* . . . */
GroupFilters
  SET group1=##class(%iKnow.Filters.GroupFilter).%New(domId,$$$GROUPFILTERAND,0)
  SET group2=##class(iKnow.Filters.GroupFilter).%New(domId,$$$GROUPFILTEROR,1)
  DO group1.AddSubFilter(filter1)
  DO group2.AddSubFilter(filter2)
  DO group2.AddSubFilter(filter3)
  DO group1.AddSubFilter(group2)
Copy code to clipboard