How do I get a Wyam pipeline of documents based of a comma-delimited meta value from a previous pipeline? -
i have wyam pipeline called "posts" filled documents. of these documents have tags
meta value, comma-delimited list of tags. example, let's has 3 documents, tags
meta of:
gumby,pokey gumby,oscar oscar,kermit
i want new pipeline filled one document each unique tag found in documents in "posts" pipeline. these documents should have tag in meta value called tagname
so, above values should result in new pipeline consisting of four documents, tagname
meta values of:
gumby pokey oscar kermit
here solution. this technically works, feel it's inefficient, , i'm pretty sure there has better way.
documents(c => c.documents["posts"] .select(d => d.string("tags", string.empty)) .selectmany(s => s.split(",".tochararray())) .select(s => s.trim().tolower()) .distinct() .select(s => c.getnewdocument( string.empty, new list<keyvaluepair<string, object>>() { new keyvaluepair<string, object>("tagname", s) } )) )
so, i'm calling documents
, passing in contextconfig
- gets documents "posts" (i have collection of documents)
- selects
meta value (now have collection of strings) - splits on comma (a bigger collection of strings)
- then trims , lower cases (still collection of strings)
- de-dupes (a smaller collection of strings)
- then creates new document each value in list, empty body ,
value string (i should end collection of new documents)
again, works. there better way?
that's not bad @ - part of challenge here getting comma-separated list of tags can processed linq expression or similar. part unavoidable , accounts 3 of lines in expression.
that said, wyam provide little here tolookup()
extension (see bottom of page:
here's how might (this code self-contained linqpad script , need adjusted use in wyam config file):
public void main() { engine engine = new engine(); engine.pipelines.add("posts", new postsdocuments(), new meta("tagarray", (doc, ctx) => doc.string("tags") .tolowerinvariant().split(',').select(x => x.trim()).toarray()) ); engine.pipelines.add("tags", new documents(ctx => ctx.documents["posts"] .tolookup<string>("tagarray") .select(x => ctx.getnewdocument(new metadataitems { { "tagname", x.key } }))), new execute((doc, ctx) => { console.writeline(doc["tagname"]); return null; }) ); engine.execute(); } public class postsdocuments : imodule { public ienumerable<idocument> execute(ireadonlylist<idocument> inputs, iexecutioncontext context) { yield return context.getnewdocument(new metadataitems { { "tags", "gumby,pokey" } }); yield return context.getnewdocument(new metadataitems { { "tags", "gumby,oscar" } }); yield return context.getnewdocument(new metadataitems { { "tags", "oscar,kermit" } }); } }
gumby pokey oscar kermit
a lot of housekeeping set fake environment testing. important part you're looking this:
engine.pipelines.add("tags", new documents(ctx => ctx.documents["posts"] .tolookup<string>("tagarray") .select(x => ctx.getnewdocument(new metadataitems { { "tagname", x.key } }))), // ... );
note still have work of getting comma delimited tags list array - it's happening earlier in "posts" pipeline.
Post a Comment