Somewhat related to my previous post, I'm trying to figure out how to choose a minimal collection from a large dataset based upon arbitrary requirements and hard-coded relationships. It's a complex situation, so I'm going to analogise.
The analogy: A bunch of companies are going in together to rent an amusement park. Within each company, employees are given tickets, and groups are given maximum-attendee quotas . An employee can use up to one ticket for hirself, zero or more for family members, and zero or more for friends. Groups may indicate non-support of the event, in which case all tickets given to the employees in those groups are worthless. Typically more tickets are issued to employees in a group than the group is allowed to have attend, so tickets may be given to people who can't -- or don't -- attend the event.
So far, so good. (Now is when the analogy starts to break down.) In the complete dataset, there are thousands of companies. each with from zero to hundreds of employees and from zero to hundreds of tickets. In addition, some companies may not issue any tickets, or issue some to a few employees and none to others.
Now to start on the problem. When building the subset collection, all entities directly related to a selected item are eligible for inclusion, except for the company. That is, if the requirement is to select an employee, hir company is included in the collection; but if the requirement is to select a company, that doesn't drag in all of the company's groups, employees, and what-not. But because of the relationships, satisfying one requirement might satisfy others as well. A requirement stated as 'select n things' is interpreted as 'select at least n things.' All requirements are simple and refer to only one type of entity; there are no compound requirements such as 'select 3 employees belonging to the same company.'
- Case 1: Select 5 companies.
- As easily done as described. Five companies can be chosen at random and we're done. Any employees. groups, tickets, or attendees associated with those companies are included in the collection for free.
- Case 2: Select 5 employees
- A little more complicated, since we may need to hunt through several companies to find that many, but still not a challenge.
- Case 3: Select 5 companies and 5 employees
- As stated, we can find five companies at random. If among them they have 5 employees, we're done. Otherwise we need to throw out one or more of our selected companies and choose again until we get enough employees amongst our company selection.
- Case 4: Select 5 attendees
- This is basically a change rung on case #2; we're just selecting from a different pile.
- Case 5: Select 5 tickets given to friends
- To satisfy this will mean the free inclusion of from 1 to 5 employees and 1 to 5 companies. In the best case, one employee (hence one company) can be found that gave out that many; in the degenerate case, it may require five employees from as many different companies.
- Case 6: Select 1 company, 5 groups, 1 employee, and 10 tickets used for employees
- Because of the implicit relationships, the primary limiting factor here is the tickets. Since each employee can use at most one ticket for hirself -- and has to have been issued some tickets in the first place -- it means we have to find at least ten employees that used a ticket personally. Once we do that, we have intrinsically met the '1 employee' and '1 company' requirements, and just need to repeat a modified case #3 -- find (and throw out) ticket-using employees until we've found ten from at least five different groups.
That is not an exhaustive list of cases, but it does illustrate the issue. A given collection instruction may have from zero to eight distinct 'bring me this many rocks' requirement clauses (including tickets-as-a-class ['select 5 tickets'] and individual usages ['select 4 tickets given to friends and 1 given to family']). I'm trying to figure out how to reduce (see case #6) and order the arbitrary input requirements to optimise selecting things from the dataset. "What should I look for in what order, and how can I tell when I've gotten all I need?"
And, of course, I want to do this in Ruby.
Horribly presented and described, no doubt, but I just wrote this in one sitting.