Changes in version 0.5.1.9000                      

New features

  - unpack_sub_criteria()
  - flatten_list()

Changes

  - group_stats now controls which slots are updated for the pid and
    epid S4 objects. [Verify].
  - attr_eval() has been removed. Instead, use unpack_sub_criteria(x,
    part = 'attribute') to extract a sub_criteria attributes before
    manipulating them as needed.
  - New argument (stepwise_method) in links(). It replaces shrink and
    expand.
      - "expand_with_priority" maps to expand == TRUE & shrink == FALSE.
      - "ordered_only" maps to expand == FALSE & shrink == FALSE.
      - "shrink_to_last_match" maps to shrink == TRUE. Please use
        stepwise_method moving forward. shrink and expand will be
        removed later.
  - bys_func upgrades. Less memory.
  - reverse_number_line() upgrades. Less memory.
  - make_pairs() upgrades. Less memory.
  - overlap() upgrades. Less memory.
  - eval_sub_criteria() upgrades. uses less memory.
  - combi() is now a wrapper function for data.table::frankv

Bug fixes

  - The result of expand_number_line(point = 'start', ...) was incorrect
    for descending number lines. Corrected.

                 Changes in version 0.5.1 (2023-11-12)                  

New features

Changes

Bug fixes

  - links() - Incorrect results in some situations. Resolved.
  - links_af_probabilistic() - Failed in some situations. Resolved.

                 Changes in version 0.5.0 (2023-11-05)                  

New features

  - New option ("semi") for the batched argument in links(). All matches
    are compared against the record-set in the next iteration.
    Therefore, the number of record-pairs increase exponentially as new
    matches are found. This means fewer record-pairs (memory usage) but
    a longer run time compared to the "no" option. Conversely, it leads
    to more record-pairs (memory usage) but a shorter run time compared
    to the "yes" option.
  - New argument (batched) in episodes()
  - New argument (split) in episodes(). Split the analysis in N-splits
    of strata. This leads to fewer record-pairs (and memory usage) but a
    longer run time.
  - New argument (decode) in as.data.frame.pid(), as.data.frame.epid()
    and as.data.frame.pane()
  - New function - episodes_af_shift(). A more vectorised approach to
    episodes() based on epidm::group_time().
  - New function - links_wf_episodes(). Implantation of episodes() using
    links().

Changes

  - Optimised episodes() and links(). Each iteration now uses less time
    and memory.
  - link_id slot in pid objects is now a list.
  - links() - records with missing values in a sub_criteria are now
    skipped at the corresponding iteration.
  - Updated argument in links()- recursive. This now takes any of three
    options [c("linked", "unlinked", "none")] . [c("linked",
    "unlinked")] collectively were previously [TRUE], while ["none"] was
    previously [FALSE].
  - as.epids() now calls make_episodes().
  - The default value for the window argument in partitions() is now
    NULL
  - as.data.frame() and as.data.list() now only creates elements/fields
    from non-empty fields
  - id and gid slots in number_line objects are now integer(0) by
    default.
  - episode_group(), record_group() and range_match_legacy() have been
    removed.
  - ["recurisve"] episodes from episodes() are now presented as
    ["rolling"] episodes with reference_event = "all_records" i.e
      - Old syntax ~ episodes(..., episode_type == "recursive")
      - New syntax ~ episodes(..., episode_type == "rolling",
        reference_event = "all_records")

Bug fixes

  - When recursive was TRUE, links() ended prematurely and therefore
    missed some matches. Resolved.
  - recurrence_sub_criteria in episodes() was not implemented correctly
    and lead to incorrect linkage result in some instances. Resolved.
  - overlap_method() - logical tests recycled incorrectly. Resolved.
  - check_links argument - Option "g" implemented as option "l".
    Resolved.
  - make_pairs_wf_source(). Created incorrect pairs. Resolved.
  - case_sub_criteria and recurrence_sub_criteria in episodes() led to
    incorrect results. Resolved.

                 Changes in version 0.4.2 (2022-12-20)                  

New features

  - New argument in merge_ids() - shrink and expand.
  - New S3 method for class ‘d_report’ - plot.
  - New S3 method for class ‘sub_criteria’ - format.
  - New function - true(). Predefined logical test for use with
    sub_criteria().
  - New function - false(). Predefined logical test for use with
    sub_criteria().
  - New argument in links()- batched. Specify if all record pairs are
    created or compared at once ("no") or in batches ("yes").
  - New argument in links()- repeats_allowed. Specify if record-pairs
    with duplicate elements should be created.
  - New argument in links()- permutations_allowed. Specify if
    permutations of the same record-pair should be created.
  - New argument in links()- ignore_same_source. Specify if record-pairs
    from different datasets should be created.
  - New argument in eval_sub_criteria()- depth. First order of
    recursion.
  - New function - sets() and make_sets(). Create permutations of
    record-sets.

Changes

  - links() - When shrink is TRUE, records in a record-group must meet
    every listed match criteria and sub_criteria. For example, if
    pid_cri is 3, then the record must have meet matched another on the
    the first three match criteria.
  - links() - pid@iteration now tracks when a record was dealt with
    instead of when it was assigned to a record-group. For example, a
    record can be closed (matched or not matched) at iteration 1 but
    assigned to a record-group at iteration 5.
  - make_pairs() - x.* and y.* values in the output are now swapped.
  - sub_criteria can now export any data created by match_func. To do
    this, match_func must export a list, where the first element is a
    logical object. See an example below.

library(diyar)
val <- rep(month.abb[1:5], 2); val
#>  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
  output <- list(x == y, 
                 data.frame(x_val = x, y_val = y, is_match = x == y))
  return(output)
}
sub.cri.1 <- sub_criteria(
  val, match_funcs = list(match.export = match_and_export)
)

format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb,Mar ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#>  [1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
#> 
#> $mf.0.1
#> $mf.0.1[[1]]
#>    x_val y_val is_match
#> 1    Jan   Jan     TRUE
#> 2    Feb   Jan    FALSE
#> 3    Mar   Jan    FALSE
#> 4    Apr   Jan    FALSE
#> 5    May   Jan    FALSE
#> 6    Jan   Jan     TRUE
#> 7    Feb   Jan    FALSE
#> 8    Mar   Jan    FALSE
#> 9    Apr   Jan    FALSE
#> 10   May   Jan    FALSE

  - links can now export any data created within a sub_criteria. To do
    this, the sub_criteria must be created as described above. See an
    example below

val <- 1:5
diff_one_and_export <- function(x, y){
  diff <- x - y
  is_match <- diff <= 1
  output <- list(is_match, 
                 data.frame(x_val = x, y_val = y, diff = diff,  is_match = is_match))
  return(output)
}
sub.cri.2 <- sub_criteria(
  val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
  criteria = "place_holder", 
  sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (Skipped)"
#> 
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> $export$cri.1$iteration.1$mf.0.1[[1]]
#>   x_val y_val diff is_match
#> 1     1     1    0     TRUE
#> 2     2     1    1     TRUE
#> 3     3     1    2    FALSE
#> 4     4     1    3    FALSE
#> 5     5     1    4    FALSE
#> 
#> 
#> 
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> $export$cri.1$iteration.2$mf.0.1[[1]]
#>   x_val y_val diff is_match
#> 1     3     3    0     TRUE
#> 2     4     3    1     TRUE
#> 3     5     3    2    FALSE

Bug fixes

  - summary.epid() - Incorrect count for ‘by episode type’. Resolved.
  - episodes() - Incorrect results in some instances with skip_order.
    Resolved.
  - make_ids() - Did not capture all records in that should be in a
    record-group when matches are recursive. Resolved.
  - make_pairs() - Incorrect record-pairs in some instances. Resolved.
  - eval_sub_criteria() - When output of match_func is length one, it’s
    not recycled. Resolved.
  - reverse_number_line() - Incorrect results in some instances.
    Resolved.
  - links()- Incorrect iteration (pids slot) for non-matches. Resolved.
  - links() and episodes() - Timing for each iteration was incorrect.
    Resolved.

                 Changes in version 0.4.1 (2021-12-05)                  

New features

  - New function - overlap_method_names(). Overlap methods for a
    corresponding overlap method codes.
  - Memory usage added to *with_report options for display.

Changes

  - "chain" overlap method split into "x_chain_y" and "y_chain_x".
    "chain" will continue to be supported as a keyword for "x_chain_y"
    OR "y_chain_x" method
  - "across" overlap method split into "x_across_y" and "y_across_x".
    "across" will continue to be supported as a keyword for "x_across_y"
    OR "y_across_x" methods
  - "inbetween" overlap method split into "x_inbetween_y" and
    "y_inbetween_x". "inbetween" will continue to be supported as a
    keyword for "x_inbetween_y" OR "y_inbetween_x" methods
  - Optimised overlaps().
  - Some overlap method codes have changed. Please review any previously
    specified codes with overlap_method_names().

Bug fixes

  - make_batch_pairs() (internal) created invalid record pairs.
    Resolved.

                 Changes in version 0.4.0 (2021-11-30)                  

New features

  - New function - reframe(). Modify the attributes of a sub_criteria
    object.
  - New function - link_records(). Record linkage by creating all record
    pairs as opposed to batches as with link().
  - New function - make_pairs(). Create every combination of
    records-pairs for a given dataset.
  - New function - make_pairs_wf_source(). Create records-pairs from
    different sources only.
  - New function - make_ids(). Convert an edge list to a group
    identifier.
  - New function - merge_ids(). Merge two group identifiers.
  - New function - attrs(). Pass a set of attributes to one instance of
    match_funcs or equal_funcs.

Changes

  - Optimised episodes_wf_splits()
  - Optimised episodes() and links(). Reduced processing times.
  - Three new options for the display argument. "progress_with_report",
    "stats_with_report" and "none_with_report". Creates a d_report; a
    status of the analysis over its run time.
  - eval_sub_criteria(). Record-pairs are no longer created in the
    function. Therefore, index_record and sn arguments have been
    replaced with x_pos and y_pos.
  - link_records() and links_wf_probabilistic(). The cmp_threshold
    argument has been renamed to attr_threshold.
  - show_labels argument in schema(). Two new options - "wind_nm" and
    "length" to replace "length_label".

Bug fixes

  - Incorrect wind_id list in episodes(..., data_link = "XX") in .
    Resolved.
  - Incorrect link_id in links(..., recursive = TRUE). Resolved.
  - iteration not recorded in some situations with episodes(). Resolved.
  - skip_order ends an open episode. Resolved.
  - NA in dist_wind_index and dist_epid_index when sn is supplied.
    Resolved.
  - overlap_method_codes() - overlap method codes not recycled properly.
    Resolved.

                 Changes in version 0.3.1 (2021-08-09)                  

New features

  - New function - delink(). Unlink identifiers.
  - New function - episodes_wf_splits(). Wrapper function of episodes().
    Better optimised for handling datasets with many duplicate records.
  - New function - combi(). Numeric codes for unique combination of
    vectors.
  - New function - attr_eval(). Recursive evaluation of a function on
    each attribute of a sub_criteria.

Changes

  - Two new case_nm values - Case_CR and Recurrence_CR which are Case
    and Recurrence without a sub-criteria match.

Bug fixes

  - Corrected length arrows in schema.epid.
  - Corrected outcome of eval_sub_criteria with 1 result.

                 Changes in version 0.3.0 (2021-04-25)                  

New features

  - New function - links_wf_probabilistic(). Probabilistic record
    linkage.
  - New function - partitions(). Spilt events into sections in time.
  - New function - schema(). Plot schema diagrams for pid, epid, pane
    and number_line objects.
  - New functions - encode() and decode(). Encode and decode slots
    values to minimise memory usage.
  - New argument in episodes() - case_sub_criteria and
    recurrence_sub_criteria. Additional matching conditions for temporal
    links.
  - New argument in episodes()- case_length_total and
    recurrence_length_total. Number of temporal links required for a
    window/episode.
  - New argument in links() - recursive. Control if matches can spawn
    new matches.
  - New argument in links() - check_duplicates. Control the checking of
    logical tests on duplicate values. If FALSE, results are recycled
    for the duplicates.
  - as.data.frame and as.list S3 methods for the pid, number_line, epid,
    pane objects.
  - New option for episode_type in episodes() - “recursive”. For
    recursive episodes where every linked events can be used as a
    subsequent index event.
  - recurrence_from_last renamed to reference_event and given two new
    options.

Changes

  - episodes() and links(). Speed improvements.
  - Default time zone for an epid_interval or pane_interval with POSIXct
    objects is now “GMT”.
  - number_line_sequence() - splits number_line objects. Also available
    as a seq method.
  - epid_total, pid_total and pane_total slots are populated by default.
    No need to used group_stats to get these.
  - to_df() - Removed. Use as.data.frame() instead.
  - to_s4() - Now an internal function. It’s no longer exported.
  - compress_number_line() - Now an internal function. It’s no longer
    exported. Use episodes() instead.
  - sub_criteria() - produces a sub_criteria object. Nested “AND” and
    “OR” conditions are now possible.
  - case_overlap_methods, recurrence_overlap_methods and overlap_methods
    now take integer codes for different combinations of overlap
    methods. See overlap_methods$options for the full list. character
    inputs are still supported.

Bug fixes

  - "Single-record" was wrong in links summary output. Resolved.

                 Changes in version 0.2.0 (2020-09-17)                  

New features

  - Better support for Inf in number_line objects.
  - Can now use multiple case_length or recurrence_length for the same
    event.
      - Can now use multiple overlap_methods for the corresponding
        case_length and recurrence_length.
  - New function links() to replace record_group().
  - New function sub_criteria(). The new way of supplying a sub_criteria
    in links().
  - New functions exact_match(), range_match() and range_match_legacy().
    Predefined logical tests for use with sub_criteria(). User-defined
    tests can also be used. See ?sub_criteria.
  - New function custom_sort() for nested sorting.
  - New function epid_lengths() to show the required case_length or
    recurrence_length for an analyses. Useful in confirming the required
    case_length or recurrence_length for episode tracking.
  - New function epid_windows(). Shows the period a date will overlap
    with given a particular case_length or recurrence_length. Useful in
    confirming the required case_length or recurrence_length for episode
    tracking.
  - New argument - strata in links(). Useful for stratified data
    linkage. As in stratified episode tracking, a record with a missing
    strata (NA_character_) is skipped from data linkage.
  - New argument - data_links in links(). Unlink record groups that do
    not include records from certain data sources
  - New convenience functions
      - listr(). Format atomic vectors as a written list.
      - combns(). An extension of combn to generate permutations not
        ordinarily captured by combn.
  - New iteration slot for pid and epid objects
  - New overlap_method - reverse()

Changes

  - number_line() - l and r must have the same length or be 1.
  - episodes() - case_nm differentiates between duplicates of "Case"
    ("Duplicate_C") and "Recurrent" events ("Duplicate_R").
  - Strata and episode-level options for most arguments. This gives
    greater flexibility within the same instance of episodes().
      - Episode-level - The behaviour for each episode is determined by
        the corresponding option for its index event ("Case").
          - episode_type - simultaneously track both "fixed" and
            "rolling" episodes.
          - skip_if_b4_lengths - simultaneously track episodes where
            events before a cut-off range are both skipped and not
            skipped.
          - episode_unit - simultaneously track episodes by different
            units of time.
          - case_for_recurrence - simultaneously track "rolling"
            episodes with and without an additional case window for
            recurrent events.
          - recurrence_from_last - simultaneously track "rolling"
            episodes with reference windows calculated from the first
            and last event of the previous window.
      - Strata-level - The behaviour for each episode is determined by
        the corresponding option for its strata. Options must be the
        same in each strata.
          - from_last - simultaneously track episodes in both directions
            of time - past to present and present to past.
          - episodes_max - simultaneously track different number of
            episodes within the dataset.
  - include_overlap_method - "overlap" and "none" will not be combined
    with other methods.
      - "overlap" - mutually inclusive with the other methods, so their
        inclusion is not necessary.
      - "none" - mutually exclusive and prioritised over the other
        methods (including "none"), so their inclusion is not necessary.
  - Events can now have missing cut-off points (NA_real_) or periods
    (number_line(NA_real_, NA_real_)) case_length and recurrence_length.
    This ensures that the event does not become an index case however,
    it can still be part of different episode. For reference, an event
    with a missing strata (NA_character_) ensures that the event does
    not become an index case nor part of any episode.

Bug fixes

  - fixed_episodes, rolling_episodes and episode_group -
    include_index_period didn’t work in certain situations. Corrected.
  - fixed_episodes, rolling_episodes and episode_group - dist_from_wind
    was wrong in certain situations. Corrected.

                 Changes in version 0.1.0 (2020-06-13)                  

New features

  - New argument in record_group() - strata. Perform record linkage
    separately within subsets of a dataset.
  - New argument in overlap(), compress_number_line(),
    fixed_sepisodes(), rolling_episodes() and episode_group() -
    overlap_methods and methods. Replaces overlap_method and method
    respectively. Use different sets of methods within the same dataset
    when grouping episodes or collapsing number_line objects.
    overlap_method and method only permits 1 method per per dataset.
  - New slot in epid objects - win_nm. Shows the type of window each
    event belongs to i.e. case or recurrence window
  - New slot in epid objects - win_id. Unique ID for each window. The ID
    is the sn of the reference event for each window
      - Format of epid objects updated to reflect this
  - New slot in epid objects - dist_from_wind. Shows the duration of
    each event from its window’s reference event
  - New slot in epid objects - dist_from_epid. Shows the duration of
    each event from its episode’s reference event
  - New argument in episode_group() and rolling_episodes() -
    recurrence_from_last. Determine if reference events should be the
    first or last event from the previous window.
  - New argument in episode_group() and rolling_episodes() -
    case_for_recurrence. Determine if recurrent events should have their
    own case windows or not.
  - New argument in episode_group(), fixed_episodes() and
    rolling_episodes() - data_links. Unlink episodes that do not include
    records from certain data_source(s).
  - episode_group(), fixed_episodes() and rolling_episodes() -
    case_length and recurrence_length arguments. You can now use a range
    (number_line object).
  - New argument in episode_group(), fixed_episodes() and
    rolling_episodes() - include_index_period. If TRUE, overlaps with
    the index event or period are grouped together even if they are
    outside the cut-off range (case_length or recurrence_length).
  - New slot in pid objects - link_id. Shows the record (sn slot) to
    which every record in the dataset has matched to.
  - New function - invert_number_line(). Invert the left and/or right
    points to the opposite end of the number line
  - New accessor functions -left_point(x)<-, right_point(x)<-,
    start_point(x)<- and end_point(x)<-

Changes

  - overlap() renamed to overlaps(). overlap() is now a convenience
    overlap_method to capture ANY kind of overlap.
  - "none" is another convenience overlap_method for NO kind of overlap
  - expand_number_line() - new options for point; "left" and "right"
  - compress_number_line() - compressed number_line object inherits the
    direction of the widest number_line among overlapping group of
    number_line objects
  - overlap_methods - have been changed such that each pair of
    number_line objects can only overlap in one way. E.g.
      - "chain" and "aligns_end" used to be possible but this is now
        considered a "chain" overlap only
      - "aligns_start" and "aligns_end" use to be possible but this is
        now considered an "exact" overlap
  - number_line_sequence() - Output is now a list.
  - number_line_sequence() - now works across multiple number_line
    objects.
  - to_df() - can now change number_line objects to data.frames.
      - to_s4() can do the reverse.
  - epid objects are the default outputs for fixed_episodes(),
    rolling_episodes() and episode_group()
  - pid objects are the default outputs for record_group()
  - In episode grouping, the case_nm for events that were skipped due to
    rolls_max or episodes_max is now "Skipped".
  - In episode_group() and record_group(), sn can be negative numbers
    but must still be unique
  - Optimised episode_group() and record_group(). Runs just a little bit
    faster …
  - Relaxed the requirement for x and y to have the same lengths in
    overlap functions.
      - The behaviour of overlap functions will now be the same as that
        of standard R logical tests
  - episode_group - case_length and recurrence_length arguments. Now
    accepts negative numbers.
      - negative “lengths” will collapse two periods into one, if the
        second one is within some days before the end_point() of the
        first period.
          - if the “lengths” are larger than the number_line_width(),
            both will be collapsed if the second one is within some days
            (or any other episode_unit) before the start_point() of the
            first period.
  - cheat sheet updated

Bug fixes

  - Recurrence was not checked if the initial case event had no
    duplicates. Resolved
  - case_nm wasn’t right for rolling episodes. Resolved

                 Changes in version 0.0.3 (2019-12-08)                  

Changes

  - #7 episode_group(), fixed_episodes() and rolling_episodes() -
    optimized to take less time when working with large datasets
  - episode_group(), fixed_episodes() and rolling_episodes() - date
    argument now supports numeric values
  - compress_number_line() - the output (gid slot) is now a group
    identifier just like in epid objects (epid_interval)

                 Changes in version 0.0.2 (2019-11-11)                  

New feature

  - pid S4 object class for results of record_group(). This will replace
    the current default (data.frame) in the next major release
  - epid S4 object class for results of episode_group(),
    fixed_episodes() and rolling_episodes(). This will replace the
    current default (data.frame) in the next release
  - to_s4() and to_s4 argument in record_group(), episode_group(),
    fixed_episodes() and rolling_episodes(). Changes their output from a
    data.frame (current default) to epid or pid objects
  - to_df() changes epid or pid objects to a data.frame
  - deduplicate argument from fixed_episodes() and rolling_episodes()
    added to episode_group()

Changes

  - fixed_episodes() and rolling_episodes() are now wrapper functions of
    episode_group(). Functionality remains the same but now includes all
    arguments available to episode_group()
  - Changed the output of fixed_episodes() and rolling_episodes() from
    number_line to data.frame, pending the change to epid objects
  - pid_cri column returned in record_group is now numeric. 0 indicates
    no match.
  - columns can now be used as criteria multiple times record_group()
  - #6 number_line objects can now be used as a criteria in
    record_group()

Bug fixes

  - #3 - Resolved a bug with episode_unit in episode_group()
  - #4 - Resolved a bug with bi_direction in episode_group()

                 Changes in version 0.0.1 (2019-10-06)                  

Features

  - fixed_episodes() and rolling_episodes() - Group records into fixed
    or rolling episodes of events or period of events.
  - episode_group() - A more comprehensive implementation of
    fixed_episodes() and rolling_episodes(), with additional features
    such as user defined case assignment.
  - record_group() - Multistage deterministic linkage that addresses
    missing data.
  - number_line S4 object.
      - Used to represent a range of numeric values to match using
        record_group()
      - Used to represent a period in time to be grouped using
        fixed_episodes(), rolling_episodes() and episode_group()
      - Used as the returned output of fixed_episodes() and
        rolling_episodes()